Hacker News

There is a clear difference between what OpenAI manages to do with GPT-5 and what I manage to do with GPT-5. The other day I asked for code to generate a linear regression and it gave back a figure of some points and a line through it.

If GPT-5, as claimed, is able to solve all problems in ICPC, please give the instructions on how I can reproduce it.

theptip 3 days ago [ - ]

I believe this is going to be an increasingly important factor.

Call it the “shoelace fallacy”: Alice is supposedly much smarter but Bob can tie his shoelaces just as well.

The choice of eval, prompt scaffolding, etc. all dramatically impact the intelligence that these models exhibit. If you need a PhD to coax PhD performance from these systems, you can see why the non-expert reaction is “LLMs are dumb” / progress has stalled.

paxys 3 days ago [ - ]

Yeah, until OpenAI says "we pasted the questions from ICPC into chatgpt.com and it scored 12/12" the average user isn't really going to be able to reproduce their results.

SamPatt 2 days ago [ - ]

The average user will never need to answer ICPC questions though.

Jensson 2 days ago [ - ]

No, but the average users have things they want to do that require ICPC level problem solutions. Like making optimized games etc, average users wants that for sure.

anthonypasq 3 days ago [ - ]

the average person doesnt need to do that. The benchmark for "is this response accurate and personable enough" on any basic chat app has been saturated for at least a year at this point.

simianwords 3 days ago [ - ]

Are you using the thinking model or the non thinking model? Maybe you can share your chat.

JohnKemeny 3 days ago [ - ]

I prefer not to due to privacy concerns. Perhaps you can try yourself?

I will say that after checking, I see that the model is set to "Auto", and as mentioned, used almost 8 minutes. The prompt I used was:

    Solve the following problem from a competitive programming contest. Output only the exact code needed to get it to pass on the submission server.

It did a lot of thinking, including

   I need to tackle a problem where no web-based help is available. The task involves checking if a given tree can be the result of inserting numbers 1 to n into an empty skew heap, following the described insertion algorithm. I have to figure out the minimal and maximal permutations that produce such a tree.

And I can see that it visited 13 webpages, including icpc, codeforces, geeksforgeeks, github, tehrantimes, arxiv, facebook, stackoverflow, etc.

jsnell 3 days ago [ - ]

A terse prompt and expecting a one-shot answer is really not how you'd get an LLM to solve complex problems.

I don't know what Deepmind and OpenAI did in this case, but to get an idea of the kind of scaffolding and prompting strategy that one might want, have a look at this paper where some floks used the normal generally available Gemini Pro 2.5 to solve 5/6 of the 2025 IMO problems: https://arxiv.org/pdf/2507.15855

minimaxir 3 days ago [ - ]

The point of the GPT-5 model is that it is supposed to route between thinking/nonthinking smartly. Leveraging prompt hacks such as instructing it to "think carefully" to force routing to the thinking model go against OpenAI's claims.

Workaccount2 3 days ago [ - ]

Just select GPT5-thinking if you need anything done with competence. The regular gpt5 is nothing impressive and geared more towards regular daily life chatting.

koakuma-chan 3 days ago [ - ]

Are you sure? I thought you can only specify reasoning_effort and that's it.

levocardia 3 days ago [ - ]

If you can't get a modern LLM to generate a simple linear regression I think what you have is a problem between the keyboard and the chair...