Even being generous, and saying it's a year, most capital expenditures depreciate over a period of 5-7 years. To state the obvious, training one model a year is not a saving grace
I don't understand why the absolute time period matters — all that matters is that you get enough time making money on inference to make up for the cost of training.
I think this is debatable as more models become good enough for more tasks. Maybe a smaller proportion of tasks will require SOTA models. On the other hand, the set of tasks people want to use LLMs for will expand along with the capabilities of SOTA models.
Even being generous, and saying it's a year, most capital expenditures depreciate over a period of 5-7 years. To state the obvious, training one model a year is not a saving grace
I don't understand why the absolute time period matters — all that matters is that you get enough time making money on inference to make up for the cost of training.
Don't they need to accelerate that, though? Having a 1 year old model isn't really great, it's just tolerable.
I think this is debatable as more models become good enough for more tasks. Maybe a smaller proportion of tasks will require SOTA models. On the other hand, the set of tasks people want to use LLMs for will expand along with the capabilities of SOTA models.