Hacker News

There's also a lot of benchmark trickery going on, it's becoming harder to see how the latest models really improved.

The top models also seem to have inconsistent performance depending on the time of day and how far we are from the next release.

bonesss 17 hours ago [ - ]

I’m an LLM fan, but from an engineering perspective the idea of building atop services that palpably fluctuate in capacity, performance, and capability is nutty.

Even with minor automation I feel like I can watch OpenAI and Anthropic engineers fiddling in real-time. Tuesdays behaviour changes by Thursday, 10AMs production isn’t possible at 11:30AM. Nutty.

targafarian 16 hours ago [ - ]

I chilled significantly on using Google for anything to do with business due to API (and offering) stability. (Still use Google for personal things.) But AI models seem orders of magnitude more fluid, so to my risk-averse eye, they're nothing I'd base my own business on.

senordevnyc 9 hours ago [ - ]

Imagine having a business where you're at the mercy of the fluctuations in capacity, performance, and capability that your human employees display!

intothemild 12 hours ago [ - ]

Since I started running my own inference server, I've had zero degradation that I didn't do myself. Basically the only time I see it get worse is if I drop one of the quants.

Which is what I suspect the providers are doing to fit more inference on the same amount of hardware over time.

Barbing 17 hours ago [ - ]

Interesting, Claude might be doing better since I last checked:

https://marginlab.ai/trackers/claude-code-historical-perform...

There were at least a couple of these degradation trackers.