My response simply is that performance in coding competitions such as ICPC is a very different skillset than what is required in a regular software engineering job. GPT-5 still cannot make sense of my company's legacy codebase even if asked to do the most basic tasks that a new grad out of college can figure out in a day or two. I recently asked it to fix a broken test (I had messed with it by changing one single assertion) and it declared "success" by deleting the entire test suite.
This. Dealing with the problems of a real-world legacy code base is the exact opposite of a perfectly constrained problem, verified for internal consistency probably by computers and humans, of all things, and presented neatly in a single PDF. There are dozens, if not 100s, of assumptions that humans are going to make while solving a problem (i.e., make sure you don't crash the website on your first day at work!) that an LLM is not going to. Similar to why, despite all its hype, Waymo cars are still being supervised by human drivers nearly 100% of the time and can't even park themselves regularly without stalling with no explanation.
>Waymo cars are still being supervised by human drivers nearly 100% of the time
That seems...highly implausible?
I mean that a human is ready to jump in at any point an "exception" happens.
Example: During parking, which I witness daily in my building, it happens all the time.
1. Car gets stuck trying to park, blocking either the garage or a whole SF street 2. A human intervenes, either in person (most often) or seemingly remotely, to get the car unstuck.
I'm not in the US and have never seen a self-driving car.
Can you explain how a human intervenes in person?
Do you mean these cars have a human driver on board? Or the passenger drives? Or another car drops off a driver? Or your car park is such an annoying edge case that a driver hangs around there all the time just to help park the cars?
Similar experience with windsurf.
I had a class of 5 or so test methods - ABCDE. I asked it to fix C, so it started typing out B token-by-token underneath C, such that my source file was now ABCBDE.
I don't think I'm smart enough to get it to do coding activities.
> it declared "success" by deleting the entire test suite.
The paperclip trivial solution!