How do you evaluate this except for anecdote and how do we know your experience isn't due to how you use them?

You can evaluate it as anecdote. How do I know you have the level of experience necessary to spot these kinds of problems as they arise? How do I know you're not just another AI booster with financial stake poisoning the discussion?

We could go back and forth on this all day.

you got very defensive. it was a useful question - they were asking in terms of using a local LLM, so at best they might be in the business of selling raspberry pis, not proprietary LLMs.

Yeah to me it more poisonous that people reflexively believe any pushback must be wrong because people feel empowered regardless of any measurement that may point out that people only get (maybe) out of LLM models what they put into them, and even then we can't be sure. That this situation exists and people have been primed with a complete triangulation of all the arguments just simply isn't healthy and we should demand independent measurements instead of the fumbling in the dark of the current model measurements... Or admit that measuring them isn't helpful and like a parent maybe alluded to, can only be described as anecdote and there is no discernable difference between many models.