Even the strongest frontier model they used - GPT 5.2 - I would consider barely usable for agentic programming.

I’m not really interested in analysis of the weaknesses of such models because in my experience many weaknesses disappear entirely as models get stronger and reasoning effort is turned up. Especially if you tell them what you want them to do.

Also, it’s not surprising to learn that when more acceptance criteria are added the failure rate increases.

Oldheads remember when GPT 5.2 was at the forefront of agentic programming. December 2025 feels like eons ago, but alack it was an entire half year!

Wait isn't gpt 5.2 good? Or is it not thinking / not codex? 5.2 was what sparked the late 2025 openai agentic programming revolution.