The pace of notable releases across the industry right now is unlike any time I remember since I started doing this in the early 2000's. And it feels like it's accelerating
The pace of notable releases across the industry right now is unlike any time I remember since I started doing this in the early 2000's. And it feels like it's accelerating
How is this a notable release? It's strictly worse than Gemini 2.5 on coding &c, and only an iterative improvement over their own models. The only thing that struck me as particularly interesting was the native visual reasoning.
It's not worse on coding. SWE Bench, Aider, live bench coding all show noticeably better results.
Lots of releases but very little actual performance increases
Sonnet and Gemini saw fairly substantial perf increases recenly
Love Sonnet but 3.7 is not obviously an improvement over 3.5 in my real world usage. Gemini 2.5 pro is great, has replaced most others for me (Grok I use for things that require realtime answers)
Are you comparing it with or without thinking? I'd say it's a fairly big improvement in long thinking mode.
It does a lot better on philosophy questions.
Not really. We’re definitely in the incremental improvement stage at this point. Certainly no indication that progress is “accelerating”.
Integration is accelerating rapidly. Even if model development froze today, we would still probably have ~5 years of adoption and integration before it started to level off.
You are both correct. It feels like the tech itself is kinda plateauing but it's still massively under-used. It will take a decade or more before the deployment starts slowing down.
But we're seeing incremental improvements every two months, so...
ChatGPT 3 : iPhone 1
A bunch of models later, we're about on the iPhone 4-5 now. Feels about right.
It's more like GPT-3 is the Manchester Baby, and we're somewhere around IBM 700 series right now. Still a long way to go to iPhone, as much as the industry likes to pretend otherwise.
Both were big consumer commercial breakouts and far better than predecessors. And several years later both see only iterative improvements.
Neither apply to your analogy.