I've done the modeling on this a few times and I always get to a place where inference can run at 50%+ gross margins, depending mostly on GPU depreciation and how good the host is at optimizing utilization. The challenge for the margins is whether or not you consider model training costs as part of the calculation. If model training isn't capitalized + amortized, margins are great. If they are amortized and need to be considered... yikes

Why wouldn't you factor in training? It is not like you can train once and then have the model run for years. You need to constantly improve to keep up with the competition. The lifespan of a model is just a few months at this point.

In a recent episode of Hard Fork podcast, the hosts discussed an on-the-record conversation they had with Sam Altman from OpenAI. They asked him about profitability and he claimed that they are losing money mostly because of the cost of training. But as the model advances, they will train less and less. Once you take training out of the equation he claimed they were profitable based on the cost of serving the trained foundation models to users at current prices.

Now, when he said that, his CFO corrected him and said they aren't profitable, but said "it's close".

Take that with a grain of salt, but thats a conversation from one of the big AI companies that is only a few weeks old. I suspect that it is pretty accurate that pricing is currently reasonable if you ignore training. But training is very expensive and the reason most AI companies are losing money right now.

Unfortunately for those companies, their APIs are a commodity, and are very fungible. So they'll need to keep training or be replaced with whichever competitor will. This is an exercise in attrition.

I wonder if we’re reaching a point of diminishing returns with training, at least, just by scaling the data set. I mean, there’s a finite amount of information (that can be obtained reasonably) to be trained on. I think we’re already at a sizable chunk of that, not to mention the cost of naively scaling up. My guess is that the ultimate winner will be the one that figures out how to improve without massive training costs, through better algorithms, or maybe even just better hardware (i.e. neuristors). I mean, we know that at worst case, we should be able to build something with human level intelligence that takes about 20 watts to run, and is about the size of a human head, and you only need to ingest a small slice of all available information to do that. And training should only use about 3.5 MWh, total, and can be done with the same hardware that runs the model.

You lost me at "Sam Altman says".

> But as the model advances, they will train less and less.

They sure have a lot of training to do between now and whenever that happens. Rolling back from 5 to whatever was before it is their own admission of this fact.

I think that actually proves the opposite. People wanted an old model, not a new one, indicating that for that user base they could have just... not trained a new model.

That is for a very specific class of usecases. If they would turn up the sycophancy on the new model, those people would not call for the old onee.

The reasoning here is off. It is like saying new game development is nearly over as some people keep playing old games.

My feeling: we've yet barely scrarched the surface on the milage we can get out of even today's frontier models, but we are just at the beginning of a huge runway for improved models and architectures. Watch this space.

for their user base, sure

for their investors, however, they are promising a revolution

If people want old models, they can go to any of the competitor's , deepseek, claud, opensources, etc... That's not good news for OpenAI.

> most AI companies are losing money right now

which is completely "normal" at this point, """right"""? if you have billions of VC money chasing returns there's no time to sit around, it's all in, the hype train doesn't wait for bootstrapping profitability. and of course with these gargantuan valuations and mandatory YoY growth numbers, there is no way they are not fucking with the unit economy numbers too. (biases are hard to beat, especially if there's not much conscious effort to do so.)

Does the cost of good come down 10x or not? For say Uber it didn’t, so we went from great $6 VC funded product to mediocre $24 ride product we have today. I’m not sure I’m going to use Copilot at $1 per request. Or even $0.25. Starts to approach overseas consultant in price and ability.

I suspect we've already reached the point with models at the GPT5 tier where the average person will no longer recognize improvements and this model can be slightly improved at slow intervals and indeed run for years. Meanwhile research grade models will still need to be trained at massive cost to improve performance on relatively short time scales.

Whenever someone has complained to me about issues they are having with ChatGPT on a particular question or type of question, the first thing I do is ask them what model they are using. So far, no one has ever known offhand what model they were using, nor were not aware there are more models!

If you understand there are multiple models from multiple providers, some of those models are better at certain things than others, and how you can get those models to complete your tasks, you are in the top 1% (probably less) of LLM users.

This would be helpful if there was some kind of first principle at which to gauge that better or worse comparison but there isn't outside of people's value judgements like what you're offering.

[deleted]

I may not qualify as an "average user" but I shudder imagining being stuck using a 1+ yr stale model for development given my experiences using a newer framework than what was available during training.

Passing in docs usually helps, but I've had some incredibly aggravating experiences where a model just absolutely cannot accept their "mental mode" is incorrect and that they need to forget the tens of thousands of lines of out of date example code they've ingested during training. IMO it's an under-discussed aspect of the current effectiveness of LLM development thanks to the training arms race.

I recently had to fight Gemini to accept that a library (a Google developed AI library for JS, somewhat ironically) had just released a major version update with a lot of API changes that invalidated 99% of the docs and example code online. And boy was there a lot of old code floating around thanks to the vast amounts of SEO blog spam for anything AI adjacent.

>Passing in docs usually helps, but I've had some incredibly aggravating experiences where a model just absolutely cannot accept their "mental mode" is incorrect and that they need to forget the tens of thousands of lines of out of date example code they've ingested during training. IMO it's an under-discussed aspect of the current effectiveness of LLM development thanks to the training arms race.

I think you overestimate the amount of code turnover in 6-12 months...

Strangely, I feel GPT-5 as the opposite of an improvement over the previous models, and consider just using Claude for actual work. Also the voice mode went from really useful to useless “Absolutely, I will keep it brief and give it to you directly. …some wrong annswer… And there you have it! As simple as that!”

>Strangely, I feel GPT-5 as the opposite of an improvement over the previous models

This is almost surely wrong but my point was about GPT5 level models in general not GPT5 specifically...

The "Pro" variant of GTP-5 is probably the best model around and most people are not even aware that it exists. One reason is that as models get more capable, they also get a lot more expensive to run so this "Pro" is only available at the $200/month pro plan.

At the same time, more capable models are also a lot more expensive to train.

The key point is that the relationship between all these magnitudes is not linear, so the economics of the whole thing start to look wobbly.

Soon we will probably arrive at a point where these huge training runs must stop, because the performance improvement does not match the huge cost increase, and because the resulting model would be so expensive to run that the market for it would be too small.

>Soon we will probably arrive at a point where these huge training runs must stop, because the performance improvement does not match the huge cost increase, and because the resulting model would be so expensive to run that the market for it would be too small.

I think we're a lot more likely to get to the limit of power and compute available for training a bigger model before we get to the point where improvement stops.

As long as models continue on their current rapid improvement trajectory, retraining from scratch will be necessary to keep up with the competition. As you said, that's such a huge amount of continual CapEx that it's somewhat meaningless to consider AI companies' financial viability strictly in terms of inference costs, especially because more capable models will likely be much more expensive to train.

But at some point, model improvement will saturate (perhaps it already has). At that point, model architecture could be frozen, and the only purpose of additional training would be to bake new knowledge into existing models. It's unclear if this would require retraining the model from scratch, or simply fine-tuning existing pre-trained weights on a new training corpus. If the former, AI companies are dead in the water, barring a breakthrough in dramatically reducing training costs. If the latter, assuming the cost of fine-tuning is a fraction of the cost of training from scratch, the low cost of inference does indeed make a bullish case for these companies.

> If the latter, assuming the cost of fine-tuning is a fraction of the cost of training from scratch, the low cost of inference does indeed make a bullish case for these companies.

On the other hand, this may also turn into cost effective methods such as model distillation and spot training of large companies (similarly to Deepseek). This would erode the comparative advantage of Anthropic and OpenAI, and result in a pure value-add play for integration with data sources and features such as SSO.

It isn't clear to me that a slowing of retraining will result in advantages to incumbents if model quality cannot be readily distinguished by end-users.

> model distillation

I like to think this is the end of software moats. You can simply call a foundation model company's API enough times and distill their model.

It's like downloading a car.

Distribution still matters, of course.

In the same way that every other startup tries to sweep R&D costs under the rug and say “yeah but the marginal unit economics have 50% gross margins, we’ll be a great business soon”.

lol.

TBH I don't take anyone seriously unless they are talking about cash flows (FCFF or FCFE specifically).

Who cares about expense classification - show me the money!

Google and Facebook had negative free cash flow for years early in their lives. All the good investors were lolling at the bad investors lolling at the cash they were burning.

Ok and lets compare the cost of running those products and reinvestment vs the model businesses.

FCFF = EBIT(1-t)-Reinvestment. The operating expenses of the model business are much higher - so lower EBIT.

The larger the reinvestment the larger the hole. And the longer it continues (without clear steep barriers to entry to exclude competitors in the long run) it becomes harder to justify a high valuation.

I really dislike comparisons like this - it glosses over a lot of details.

One can explain the equation all they like - the fact is that negative free cash flow is just a reality of the early stages of some very, very good businesses.

In the 90's and early 2000s, but people laughed at businesses like Amazon & Google for years. These types of people highly focused on the free cash flow of a business in it's early years are just dumb. Sometimes a business takes a lot of investment in the early stages - whether it's capex for data centers or S&M for enterprise software businesses, or R&D for pharma businesses or whatever.

As for "clear steep barriers" - again, just clueless stuff. There weren't clear steep barriers to search when Google started, there were dozens of search engines. Google created them. Creating barriers to entry is expensive and the "FCFF people" imagine they arrive out of thin air. It takes a lot of time and or money to create them.

It's unclear if "the model business" is going to be high or low margin. It's unclear how high the barriers to entry for making models will be in practice. It's unclear what the reinvestment required will be. We are a few years into it. About the only thing that is clear is this: if you try to run a positive free cashflow business in this space over the next few years, you'll be crushed. If you want a shot at a large, high return on capital business come 2035, you better be willing to spend up now.

I spoke with management at a couple companies that were training models, and some of them expensed the model training in-period as R&D. That's why

It's possible they factor in training purely as an "R&D" cost and then can tax that development at a lower rate.

[deleted]

I agree that you could get to high margins, but I think the modeling holds only if you're an AI lab operating at scale with a setup tuned for your model(s). I think the most open study on this one is from the DeepSeek team: https://github.com/deepseek-ai/open-infra-index/blob/main/20...

For others, I think the picture is different. When we ran benchmarks on DeepSeek-R1 on 8x H200 SXM using vLLM, we got up to 12K total tok/s (concurrency 200, input:output ratio of 6:1). If you're spiking up 100-200K tok/s, you need a lot of GPUs for that. Then, the GPUs sit idle most of the time.

I'll read the blog post in more detail, but I don't think the following assumptions hold outside of AI labs.

* 100% utilization (no spikes, balanced usage between day/night or weekdays) * Input processing is free (~$0.001 per million tokens) * DeepSeek fits into H100 cards in a way that network isn't the bottleneck

I was modeling configurations purpose-built for running specific models in specific workloads. I was trying to figure out how much of a gross margin drag some software companies could have if they hosted their own models and served them up as APIs or as integrated copilots with their other offerings

I wonder how much capex risk there is in this model, depreciating the GPUs over 5 years is fine if you can guarantee utilization. Losing market share might be a death sentence for some of these firms as utilization falls.

What I hear nobody talking about is the price elasticity of demand and how this plays into the economics of the model business.

I think some of the power user demand is fairly inelastic. I’ve seen developers who are allergic to spending money happily drop $200/mo on those new Claude subscriptions.

Yeah but if you push the price up, given that many users will cancel their subscriptions you will end up with still a tiny market segment relative to what is necessary, in revenues, to justify the valuations purported.

It's a tricky one, there is also a lot of push right now to use AI so developers are incentivized to drop money on subscriptions. I'd have difficulty justifying 1k/month for smaller shops - but corporations will be different. If the average engineer is just 20% more productive, then that is a 30-60k value to the company.

I don't have difficulty getting to a 20% productivity gain with AI just from automating the tasks I procrastinate on or can't focus on. Likewise the ability to code a prototype overnight/over the weekend is a reasonable extension of practical working hours.

The challenge I do see is that fully AI generated code bases devolve into slop pretty fast. The productivity cutoffs are much lower compared to human engineers.

> whether or not you consider model training costs as part of the calculation

Whether they flow through COGS/COR or elsewhere on the income statement, they've gotta be recognized. In which case, either you have low gross margins or low operating profit (low net income??). Right?

That said, I just can't conceive of a way that training costs are not hitting gross margins. Be it IFRS/GAAP etc., training is 1) directly attributable to the production of the service sold, 2) is not SG&A, financing, or abnormal cost, and thus 3) only makes sense to match to revenue.

can you share the model?

Does that include legal fights and potential payouts to artists and writers whose work was used without permission?

Can anyone explain why it's not allowed to compensate the creators of the data?

Of course not. Those usually wouldn't be considered "margin".

Another similar example is R&D and development by engineers aren't considered in margin either.

It's already questionable if anyone can make it profitable once you account for all the costs. Why do you think they try to squash the legal concerns so hard? If they move fast and stick their fingers in their ears, they can just steal whatever the want.

[flagged]

why not?

Obviously because you are now allowed to download and share copyrighted works without permission or cost. At least that seems to be the precedent being set in court cases thus far.

Because the law as it stands says they aren't and Congress may make no ex post facto law.

ok, but that's just one country.

I mean for AI literally the only countries involved are the USA and China. I doubt you think China is going to start respecting IP rights anytime soon.

Mistral is in France.

I have to disagree. The biggest cost is still energy consumption, water and maintenance. Not to mention, to keep up with the rivals in incredibly high tempo (so offering billions like Meta recently). Then the cost of hardware that is equal to Nvidia skyrocketing shares :) No one should dare to talk about profit yet. Now is time to grab the market, invest a lot and work hard, hopping for a future profit. The equation is still work on progress.

The capital costs for the GPU are an order of magnitude larger than the energy consumption. It doesn't matter whether the GPUs are used for training or inference.

Back of the envelope: $25k GPU amortized over 5 years is $5k/year. A 500W GPU run at full power uses 4.5MWh; at $0.15/kWh the electricity costs $650/year.

The other operating costs you suggest have to be even smaller.

Is that not baked into the h100 rental costs?

It is.

> The biggest cost is still energy consumption, water and maintenance.

Are you saying that the operating costs for inference exceed the costs of training?

The global cost of inference in both Openai and Anthropic it exceed training cost for sure. The reason is simple: the inference cost grows with requests not with datasets. My math simplified by AI says: Suppose training GPT-like model costs

= $ 10,000,000 C T

=$10,000,000.

Each query costs

= $ 0.002 C I

=$0.002.

Break-even:

> 10,000,000 0.002 = 5,000,000,000

inferences N> 0.002 10,000,000

=5,000,000,000inferences

So after 5 billion queries, inference costs surpass the training cost.

Openai claims it has 100 million users x queries = I let you judge.

No. But training an LLM is certainly very very expensive and a gamble every time you do it. I think of it a bit like a pharmaceutical company doing vaccine research…