Title says "LLMs" (plural) but they only tested one
> We only tested OpenAI’s GPT-4.1 nano.
This should be higher. While the research question is interesting, the sample size makes the conclusion highly suspect. I'd like to see more research on this.
And not even a commonly used one. Gemini Flash or o4-mini would have been a much better choice if they wanted a cheap model
This should be higher. While the research question is interesting, the sample size makes the conclusion highly suspect. I'd like to see more research on this.
And not even a commonly used one. Gemini Flash or o4-mini would have been a much better choice if they wanted a cheap model