Unless you're coming up with a deterministic set of criteria for evaluating these bugs and issues, every single model is going to keep telling you it finds new things and to fix them.

I'm sure you said the same "find mistakes please" thing to Opus 4.8 and GPT 5.5 when you were using $previous_amazing_latest_model, and they also found and fixed them.

Once the next "Fable"-type model comes out I'm sure it's going to find even more mistakes that the "special" Fable made.

You're using these models to make mistakes and then using upgraded versions of them to find their previous mistakes and fix them, until a new version comes along that can magically fix even more mistakes their previous versions made. There's no end to it.

Yes - I was thinking this - however I had already worked on it so many times with opus and gpt that I thought they had enough time to realise some common sense things that fable just got and understood first time, on the first pass. The difference seemed significant enough to comment about.