npub163k0g…whljx the problem is the LLM dataset. The input data must be nearly 100% ...

2023-10-05 15:35:18

npub163k0gvm3x8s7qqjnukf6a2jq4nus3s74j7y64pctcgrwy3nmsllswwhljx (npub163k…hljx) the problem is the LLM dataset. The input data must be nearly 100% accurate or the AI will learn incorrectly. There are millions of images out the, a huge dataset, but almost all have small mistakes.

Also, how do we input the known good results? Radiology reports, like all dictated reports, have transcription errors that humans can parse but that would likely break the model.

This means every image used for learning must be manually reviewed and typo-corrected.

Author Public Key

npub1hpyg69txhnyul5ym68s5szluplscvgq7qcxn66q2jqxyv4vjcsnsce75x3

Show more details

Baronet Michael :verified: on Nostr: npub163k0g…whljx the problem is the LLM dataset. The input data must be nearly 100% ...