Ok, explain me one thing: I have a benchmark - I feed identical prompt to multiple models. Codex produces a rough but working program. Fable produces the same - but with more bugs than Codex. Opus produces something similar to Codex but with a critical bug.
That describes all my tests with Fable.
Why should I be hyped about all that "legitimate power" if the model performs on par with two other SoTAs?
I mean, well, yes, it is impressive. It could quickly generate a lot of garbage which sorta does look like code. Two others can do the same. I don't see any groundbreaking improvement - but the price is much higher. Why the hype?
>> Why should I be hyped about all that "legitimate power" if the model performs on par with two other SoTAs?
I don't care if you're hyped or not. You asked if the posts like the OP come from a "parallel reality" and I said no and described my experience. If you're getting good/better results with Codex than with Fable, you should probably continue using that, since it's cheaper and faster.
But can you bring anything measurable in support to your words? I did.