I notice this less with GPT-5 and GPT-5-Codex but it has a new problem: it'll write a sentence that mostly makes sense but have one or two strange word choices that nobody would use in that situation. It tends to use a lot of very dense jargon that makes it hard to read, spitting out references to various algorithms and concepts in places that don't actually make sense for them to be. Also yesterday Codex refused a task from me because it would be too much work, which I thought was pretty ridiculous - it wasn't actually that much work, a couple hundred lines max.

> refused a task from me because it would be too much work

Was this after many iterations? Try letting it get some "sleep". Hear me out...

I haven't used Codex, so maybe not relevant, but with Claude I always notice a slow degradation in quality, refusals, and "<implementation here>" placeholders with iterations within the same context window. One time, after making a mistake, it apologized and said something like "that's what I get for writing code at 2am". Statistically, this makes sense: long conversations between developers would go into the night, and they get tired, their code gets sparser and crappier.

So, I told it "Ok, let's get some sleep and do this tomorrow.", then the very next message (since the LLM has no concept of time), "Good morning! Let's do this!" and bam, output a completely functional, giant, block of code.

Human behavior is deeeeep in the statistics.

That's hilarious.