ICYMI, Amodei said the same in much greater detail:

"If you consider each model to be a company, the model that was trained in 2023 was profitable. You paid $100 million, and then it made $200 million of revenue. There's some cost to inference with the model, but let's just assume, in this cartoonish cartoon example, that even if you add those two up, you're kind of in a good state. So, if every model was a company, the model, in this example, is actually profitable.

What's going on is that at the same time as you're reaping the benefits from one company, you're founding another company that's much more expensive and requires much more upfront R&D investment. And so the way that it's going to shake out is this will keep going up until the numbers go very large and the models can't get larger, and then it'll be a large, very profitable business, or, at some point, the models will stop getting better, right? The march to AGI will be halted for some reason, and then perhaps it'll be some overhang. So, there'll be a one-time, 'Oh man, we spent a lot of money and we didn't get anything for it.' And then the business returns to whatever scale it was at."

https://cheekypint.substack.com/p/a-cheeky-pint-with-anthrop...

The "model as company" metaphor makes no sense. It should actually be models are products, like a shoe. Nike spends money developing a shoe, then building it, then they sell it, and ideally those R&D costs are made up in shoe sales. But you still have to run the whole company outside of that.

Also, in Nike's case, as they grow they get better at making more shoes for cheaper. LLM model providers tell us that every new model (shoe) costs multiples more than the last one to develop. If they make 2x revenue on training, like he's said, to be profitable they have to either double prices or double users every year, or stop making new models.

But new models to date have cost more than the previous ones to create, often by an order of magnitude, so the shoe metaphor falls apart.

A better metaphor would be oil and gas production, where existing oil and gas fields are either already finished (i.e. model is no longer SOTA -- no longer making a return on investment) or currently producing (SOTA inference -- making a return on investment). The key similarity with AI is new oil and gas fields are increasingly expensive to bring online because they are harder to make economical than the first ones we stumbled across bubbling up in the desert, and that's even with technological innovation. That is to say, the low hanging fruit is long gone.

> new models to date have cost more than the previous ones to create

This largely was the case in software in the '80s-'10s (when versions largely disappeared) and still is the case in hardware. iPhone 17 will certainly cost far more to develop than did iPhone 10 or 5. iPhone 5 cost far more than 3G, etc.

I don't think it's the case if you take inflation into account.

You could see here: https://www.reddit.com/r/dataisbeautiful/comments/16dr1kb/oc...

new ones are generally cheaper if adjusted for inflation. This is a sale price, but assuming that margins stay the same it should reflect the manufacturing price. And from what I remember about apple earnings their margins increased over time, so it means the new phones are even cheaper. Which kind of makes sense.

I should have addressed this. This thread is about the capital costs of getting to the first sale, so that's model training for an LLM vs all the R&D in an iPhone.

Recent iPhones use Apple's own custom silicon for a number of components, and are generally vastly more complex. The estimates I have seen for iPhone 1 development range from $150 million to $2.5 billion. Even adjusting for inflation, a current iPhone generation costs more than the older versions.

And it absolutely makes sense for Apple to spend more in total to develop successive generations, because they have less overall product risk and larger scale to recoup.

exactly: it’s like making shoes if you’re really bad at making shoes :)

If you're going to use shoes as the metaphor, a model would be more like a shoe factory. A shoe would be a LLM answer, i.e. inference. In which case it totally makes sense to consider each factory as an autonomous economic unit, like a company.

Analogies don't prove anything, but they're still useful for suggesting possibilities for thinking about a problem.

If you don't like "model as company," how about "model as making a movie?" Any given movie could be profitable or not. It's not necessarily the case that movie budgets always get bigger or that an increased budget is what you need to attract an audience.

I believe better analogy is CPU development on next process node.

each node is much more expensive to design for, but when you finally have it you basically print money.

and of course you always have to develop next more powerful and power efficient CPU to keep competitive

>Also, in Nike's case, as they grow they get better at making more shoes for cheaper.

This is clearly the case for models as well. Training and serving inference for GPT4 level models is probably > 100x cheaper than they used to be. Nike has been making Jordan 1's for 40+ years! OpenAI would be incredibly profitable if they could live off the profit from improved inference efficiency on a GPT4 level model!

>>This is clearly the case ... probably

>>OpenAI would be incredibly profitable if they could live off the profit from improved inference efficiency on a GPT4 level model!

If gpt4 was basically free money at this point it's real weird that their first instinct was to cut it off after gpt5

> If gpt4 was basically free money at this point it's real weird that their first instinct was to cut it off after gpt5

People find the UX of choosing a model very confusing, the idea with 5 is that it would route things appropriately and so eliminate this confusion. That was the motivation for removing 4. But people were upset enough that they decided to bring it back for a while, at least.

They picked the worst possible time to make the change if money wasn’t involved (which is why I assumed GPT-5 must be massively cheaper to run). The backlash from being forced to use it cost a fair bit of the model’s reputation.

Yeah it didnt work out for them, for sure.

I think the idea here is that gpt-5-mini is the cheap gpt-4 quality model they want to serve and make money on.

It's model as a company because people are using the VC mentality, and also explaining competition.

Model as a product is the reality, but each model competes with previous models and is only successful if it's both more cost effective, and also more effective in general at its tasks. By the time you get to model Z, you'll never use model A for any task as the model lineage cannibalizes sales of itself.

OpenAI and Anthropic have very different customer bases and usage profiles. I'd estimate a significantly higher percentage of Anthropic's tokens are paid by the customer than OpenAI's. The ChatGPT free tier is magnitudes more popular than Claude's free tier, and Anthropic in all likelihood does a higher percentage of API business versus consumer business than OpenAI does.

In other words, its possible this story is correct and true for Anthropic, but not true for OpenAI.

Good point, very possible that Altman is excluding free tier as a marketing cost even if it loses more than they make on paid customers. On the other hand they may be able to cut free tier costs a lot by having the model router send queries to gpt-5-mini where before they were going to 4o.

This is very true. ChatGPT has a very generous free tier. I used to pay for it, but realized I was never really hitting the limits of what is needed to pay for it.

However, at the same time, I was using Claude much less, really preferring the answers from it most of the time, and constantly being hit with limits. So guess what I did. I cancelled my OpenAI subscription and moved to Anthropic. Not only do i get Claude Code, which OpenAI really has no serious competitor for.

I still use both models but never run into problems with OpenAI, so i see no reason to pay for it.

Free tier provides a lot of training material. Every time you correct ChatGPT on its mistakes you’re giving them knowledge that’s not in any book or website.

Thats a moat, albeit one that is slow to build.

That's interesting, though you have to imagine the data set is very low quality on average and distilling high quality training pairs out of it is very costly.

Hence exponential increase in model training costs. Also hallucinations in the long tail of knowledge.

Okay but noticeably he invents two numbers then pretends that a third number is irrelevant in order to claim that each model (which is not a company) is a profitable company.

You'd think maybe the CEO might be able to give a ball park on the profit made off that 2023 model.

ETA: "You paid $100 million... There's some cost to inference with the model, but let's just assume ... that even if you add those two up, you're kind of in a good state."

You see this right? He literally says that if you assume revenue exceeds costs then it's profitable. He doesn't actually say that it does though.

Also Amodei has an assumption that a 100m model will make 200m of revenue but a 1B model will make 2B of revenue. Does that really hold up? There's no phenomenon that prevents them from only making 200m of revenue off a $1B model.

> So, there'll be a one-time, 'Oh man, we spent a lot of money and we didn't get anything for it.'

GPT-4.5 has entered the chat..

>> If we didn't pay for training, we'd be a very profitable company.

> ICYMI, Amodei said the same

No. He says that even paying for training a model is profitable. It makes more revenue that it costs - all things considered. A much stronger claim.

I take them to be saying the same thing — the difference is that Altman is referring to the training of the next model happening now, while Amodei is referring to the training months ago of the model you're currently earning money back on through inference.

Maybe he means that but the quote says “We're profitable on inference.” - not “We're profitable on inference including training of that model.”

This sounds like fabs.

Fantastic perspective.

Basically each new company puts competitive pressure on the previous company, and together they compress margins.

They are racing themselves to the bottom. I imagine they know this and bet on AGI primacy.

> I imagine they know this and bet on AGI primacy.

Just like Uber and Tesla are betting on self driving cars. I think it's been 10 years now ("any minute now").

Notably, Uber switched horses and now runs Waymos with no human drivers.

The drivers are remote, but they are still there, dropping in as needed.

Copy laundering as a service is only profitable when you discount future settlements:

https://www.reuters.com/legal/government/anthropics-surprise...

I don't see why the declining marginal returns can't be continuous.