Guess it really depends on what you use them for. I've been able to built whole apps with them, not slop. Kimi is quite good at design, for 3D, I noticed Gemini 3.1 is excellent for basic to medium use cases.

I've tried both Opus and GPT 5.4, they also hallucinate just like the rest at a much higher cost.

The more you use a model overtime, the better you become with it. It's really hard to measure, my main metric lately has been tokens per second/time to complete task.

At this point I've the feeling frontier models are optimizing for benchmarks and one shot prompts.