I haven't hit the "dumb zone" any more since two months. I think this talk is outdated.

I'm using CC (Opus) thinking and Codex with xhigh on always.

And the models have gotten really good when you let them do stuff where goals are verifiable by the model. I had Codex fix a Rust B-rep CSG classification pipeline successfully over the course of a week, unsupervised. It had a custom STEP viewer that would take screenshots and feed them back into the model so it could verify the progress resp. the triangle soup (non progress) itself.

Codex did all the planning and verification, CC wrote the code.

This would have not been possible six months ago at all from my experience.

Maybe with a lot of handholding; but I doubt it (I tried).

I mean both the problem for starters (requires a lot of spatial reasoning and connected math) and the autonomous implementation. Context compression was never an issue in the entire session, for either model.