I think they will all be minor going forward, feels like the major improvements have all been made and we'll only see incremental improvements from here on out. Maybe I'm wrong but we'll see.
Hard to say. People made the same prediction a year ago because we supposedly ran out of training data. There could be indefinite rapid compounding improvements so long as there's free money out there.
With RLHF and RLVR we are creating tons of new training data, that is much more focused than reading the Internet. Annotation shops are doing many billions per year in revenue creating newer data, and a lot of it is highly complex, focused on rewarding multi turn agentic trajectories.
I think one of the challenges is that the models were all initially trained on the entire Internet (or as much as they could gather) and now they’re having to deal with an increasing amount of the Internet being AI generated content which may be why GPT-5.5 started being obsessed with goblins and you start seeing amusing things in the system prompt trying to get the model to stop bringing them up.
I think we lack benchmarks that could meaningfully indicate progress. They are mostly garbage that's saturated at this point. God wouldn't score much higher in them.
I think they will all be minor going forward, feels like the major improvements have all been made and we'll only see incremental improvements from here on out. Maybe I'm wrong but we'll see.
Hard to say. People made the same prediction a year ago because we supposedly ran out of training data. There could be indefinite rapid compounding improvements so long as there's free money out there.
With RLHF and RLVR we are creating tons of new training data, that is much more focused than reading the Internet. Annotation shops are doing many billions per year in revenue creating newer data, and a lot of it is highly complex, focused on rewarding multi turn agentic trajectories.
I think there's just less time between model releases now
I think one of the challenges is that the models were all initially trained on the entire Internet (or as much as they could gather) and now they’re having to deal with an increasing amount of the Internet being AI generated content which may be why GPT-5.5 started being obsessed with goblins and you start seeing amusing things in the system prompt trying to get the model to stop bringing them up.
Wasn't Mythos a step change improvement?
Yeah. They are aware: "Users will find Opus 4.8 to be a modest but tangible improvement on its predecessor."
I think we lack benchmarks that could meaningfully indicate progress. They are mostly garbage that's saturated at this point. God wouldn't score much higher in them.
Yes, but if version number go up, so do all other number