Well, only if the one training model continued to function as a going business. Their amortization window for the training cost is 2 months or so. They can't just keep that up and collect $.

They have to build the next model, or else people will go to someone else.

Why two months? It was almost a year between Claude 3.5 and 4. (Not sure how much it costs to go from 3.5 to 3.7.)

Even being generous, and saying it's a year, most capital expenditures depreciate over a period of 5-7 years. To state the obvious, training one model a year is not a saving grace

I don't understand why the absolute time period matters — all that matters is that you get enough time making money on inference to make up for the cost of training.

Don't they need to accelerate that, though? Having a 1 year old model isn't really great, it's just tolerable.

I think this is debatable as more models become good enough for more tasks. Maybe a smaller proportion of tasks will require SOTA models. On the other hand, the set of tasks people want to use LLMs for will expand along with the capabilities of SOTA models.