There is a clear difference between what OpenAI manages to do with GPT-5 and what I manage to do with GPT-5. The other day I asked for code to generate a linear regression and it gave back a figure of some points and a line through it.
If GPT-5, as claimed, is able to solve all problems in ICPC, please give the instructions on how I can reproduce it.
I believe this is going to be an increasingly important factor.
Call it the “shoelace fallacy”: Alice is supposedly much smarter but Bob can tie his shoelaces just as well.
The choice of eval, prompt scaffolding, etc. all dramatically impact the intelligence that these models exhibit. If you need a PhD to coax PhD performance from these systems, you can see why the non-expert reaction is “LLMs are dumb” / progress has stalled.
Yeah, until OpenAI says "we pasted the questions from ICPC into chatgpt.com and it scored 12/12" the average user isn't really going to be able to reproduce their results.
The average user will never need to answer ICPC questions though.
No, but the average users have things they want to do that require ICPC level problem solutions. Like making optimized games etc, average users wants that for sure.
the average person doesnt need to do that. The benchmark for "is this response accurate and personable enough" on any basic chat app has been saturated for at least a year at this point.
Are you using the thinking model or the non thinking model? Maybe you can share your chat.
I prefer not to due to privacy concerns. Perhaps you can try yourself?
I will say that after checking, I see that the model is set to "Auto", and as mentioned, used almost 8 minutes. The prompt I used was:
It did a lot of thinking, including And I can see that it visited 13 webpages, including icpc, codeforces, geeksforgeeks, github, tehrantimes, arxiv, facebook, stackoverflow, etc.A terse prompt and expecting a one-shot answer is really not how you'd get an LLM to solve complex problems.
I don't know what Deepmind and OpenAI did in this case, but to get an idea of the kind of scaffolding and prompting strategy that one might want, have a look at this paper where some floks used the normal generally available Gemini Pro 2.5 to solve 5/6 of the 2025 IMO problems: https://arxiv.org/pdf/2507.15855
The point of the GPT-5 model is that it is supposed to route between thinking/nonthinking smartly. Leveraging prompt hacks such as instructing it to "think carefully" to force routing to the thinking model go against OpenAI's claims.
Just select GPT5-thinking if you need anything done with competence. The regular gpt5 is nothing impressive and geared more towards regular daily life chatting.
Are you sure? I thought you can only specify reasoning_effort and that's it.
If you can't get a modern LLM to generate a simple linear regression I think what you have is a problem between the keyboard and the chair...