The fact that a huge amount uf parameters may lead to worse hallucinations is something I didn't think of. Would this somewhat imply that DeepSeek V4 flash should be less prone to these issues?
The fact that a huge amount uf parameters may lead to worse hallucinations is something I didn't think of. Would this somewhat imply that DeepSeek V4 flash should be less prone to these issues?
small models cannot encode so many facts, they will hallucinate more out-of-box
a key method to help with hallucinations is to provide good sources when asking questions (context engineering / knowledge base)