Ask around and see if you can find anyone you know who's experienced the November 2025 effect. Claude Code / Codex with GPT-5.1+ or Opus 4.5+ really did make a material difference - they flipped the script from "can write code that often works" to "can write code that almost always works".
I know you'll dismiss that as the same old crap you've heard before, but it's pretty widely observed now.
I’ve been living this experience and using latest models in work throughout this time. The failure modes of LLMs have not fundamentally changed. The makers are not awfully transparent about what exactly they change in each model release the same way you know what changed in i.e., a new Django version. But there’s not been a paradigm shift. I believe/guess (from outside) the big change you think you’re experiencing could be result of many things like better post training processes (RLHF) for models to run a predefined set of commands like always running tests, or other marginal improvements to the models and focusing on programming tasks. To be clear these improvements are welcome and useful, just not the groundbreaking change some claim.