Confabulations are decreasing with newer models. I tested confabulations based on provided documents (relevant for RAG) here: https://github.com/lechmazur/confabulations/. Note the significant difference between GPT-4 Turbo and GPT-4o.
Confabulations are decreasing with newer models. I tested confabulations based on provided documents (relevant for RAG) here: https://github.com/lechmazur/confabulations/. Note the significant difference between GPT-4 Turbo and GPT-4o.
Is 3% supposed to be significant? Or did you mean 4 Turbo and 4o mini?
It is significant because of the other chart that shows MUCH lower non-response rates for GPT-4o.
That’s very interesting! Thanks for the link.