What does this show that we didn't know already? LLMs cannot provide accurate answers to questions where data is not included in their training sets. This doesn't appear to have much substance
What does this show that we didn't know already? LLMs cannot provide accurate answers to questions where data is not included in their training sets. This doesn't appear to have much substance
LLMs can and will provide inaccurate answers to questions where data is included in their training sets too, that's in the nature of neural networks. It's just less likely that when the data is not in the training set...
Unfortunately most people are not aware of this and treat LLM models as this superpowered brain who knows everything and can do everything.
Well then it shows that these models are using widely disparate training sets and have high confidence even when they shouldn't.
Questions like "is mouthwash effective" presumably has one solid data source -- medical journals.
But the prompt didn't give the models the option to say "I don't know", so it wasn't a measure of their confidence.
What are you talking about? The models were not ALLOWED to have confidence (or the lack thereof). They were explicitly told to give a single label, and in most cases, all of them were correct depending on additional context they would surely have provided, especially with access to the internet (which some didn't have). This is just silly.
They will happily google it for you and give you the top reddit comment.
This is worse.