> money on fire forever just to jog in place.

Why?

I don't see why these companies can't just stop training at some point. Unless you're saying the cost of inference is unsustainable?

I can envision a future where ChatGPT stops getting new SOTA models, and all future models are built for enterprise or people willing to pay a lot of money for high ROI use cases.

We don't need better models for the vast majority of chats taking place today E.g. kids using it for help with homework - are today's models really not good enough?

They aren't. They are obsequious. This is much worse than it seems at first glance, and you can tell it is a big deal because a lot of effort going into training the new models is to mitigate it.

>I don't see why these companies can't just stop training at some point.

Because training isn't just about making brand new models with better capabilities, it's also about updating old models to stay current with new information. Even the most sophisticated present-day model with a knowledge cutoff date of 2025 would be severely crippled by 2027 and utterly useless by 2030.

Unless there is some breakthrough that lets existing models cheaply incrementally update their weights to add new information, I don't see any way around this.

Ain't never hearda rag

There is no evidence that RAG delivers equivalent performance to retraining on new data. Merely having information in the context window is very different from having it baked into the model weights. Relying solely on RAG to keep model results current would also degrade with time, as more and more information would have to be incorporated into the context window the longer it's been since the knowledge cutoff date.

I honestly do not think that we should be training models to regurgitate training data anyway.

Humans do this to a minimum degree, but the things that we can recount from memory are simpler than the contents of an entire paper, as an example.

There's a reason we invented writing stuff down. And I do wonder if future models should be trying to optimise for rag with their training; train for reasoning and stringing coherent sentences together, sure, but with a focus on using that to connect hard data found in the context.

And who says models won't have massive or unbounded contexts in the future? Or that predicting a single token (or even a sub-sequence of tokens) still remains a one shot/synchronous activity?