Huh.

I feel oddly skeptical about this article; I can't specifically argue the numbers, since I have no idea, but... there are some decent open source models; they're not state of the art, but if inference is this cheap then why aren't there multiple API providers offering models at dirt cheap prices?

The only cheap-ass providers I've seen only run tiny models. Where's my cheap deepseek-R1?

Surely if its this cheap, and we're talking massive margins according to this, I should be able to get a cheap / run my own 600B param model.

Am I missing something?

It seems that reality (ie. the absence of people actually doing things this cheap) is the biggest critic of this set of calculations.

> but if inference is this cheap then why aren't there multiple API providers offering models at dirt cheap prices

There are multiple API providers offering models at dirt cheap prices, enough so that there is at least one well-known API provider that is an aggreggator of other API providers that offers lots of models at $0.

> The only cheap-ass providers I've seen only run tiny models. Where's my cheap deepseek-R1?

https://openrouter.ai/deepseek/deepseek-r1-0528:free

How is this possible? I imagine someone is finding some value in the prompts themselves but this cant possibly be paying for itself.

Inference is just that cheap plus they hope that you'll start using the ones they charge for as you become more used to using AI in your workflow.

you can also run deepseek for free on a modestly sized laptop

At 4-bit quant, R1 takes 300+ gigs just for weights. You can certainly run smaller models into which R1 has been distilled on a modest laptop, but I don't see how you can run R1 itself on anything that wouldn't be considered extreme for a laptop in at least one dimension.

You're probably thinking of what ollama labels "deepseek" which is not in fact deepseek, but other models with some deepseek distilled into them.

> why aren't there multiple API providers offering models at dirt cheap prices?

There are. Basically every provider's R1 prices are cheaper than estimated by this article.

https://artificialanalysis.ai/models/deepseek-r1/providers

The cheapest provider in your link charges 460x more for input tokens than the article estimates.

> The cheapest provider in your link charges 460x more for input tokens than the article estimates.

The article estinates $0.003 per million input tokens, the cheapest on the list is $0.46 per million. The ratio is 120×, not 460×.

OTOH, all of the providers are far below the estimated $3.08 cost per million output tokens

There are 7 providers on that page which have higher output token price than $3.08. There is even 1 which has higher input token price than that. So that "all" is not true either.

> I should be able to get a cheap / run my own 600B param model.

if the margins on hosted inference are 80%, then you need > 20% utilization of whatever you build for yourself for this to be less costly to you (on margin).

i self-host open weight models (please: deepseek et al aren't open _source_) on whatever $300 GPU i bought a few years ago, but if it outputs 2 tokens/sec then i'm waiting 10 minutes for most results. if i want results in 10s instead of 10m, i'll be paying $30000 instead. if i'm prompting it 100 times during the day, then it's idle 99% of the time.

coordinating a group buy for that $30000 GPU and sharing that across 100 people probably makes more sense than either arrangement in the previous paragraph. for now, that's a big component of what model providers, uh, provide.

I also have no idea on the numbers. But I do know that these same companies are pouring many billions of dollars into training models, paying very expensive staff, and building out infrastructure. These costs would need to be factored in to come up with the actual profit margins.

There are, I screenshotted DeepInfra in the article, but there are a lot more https://openrouter.ai/deepseek/deepseek-r1-0528

is that a quantized model or the full r1?

Imo the article is totally off the mark since it assumes users on average do not go over th 1M tokens per day.

Afaik openai doesn't enforce a daily quota even on the $20 plans unless the platform is under pressure.

Since I often consume 20M token per day, one can assume many would use far more than the 1M tokens assumed in the article's calculations.

There's zero basis for assuming any of that. The most likely situation is a power law curve where the vast majority of users don't use it much at all and the top 10% of users account for 90% of the usage.

It is very likely that you are in the top 10% of users.

True. the article also has zero basis in its estimating the average usage from each tier's user base.

I somewhat doubt my usage is so close to the edge of the curve since I don't even pay for any plan. It could be that I'm very frugal with money and fat on consumption while most are more balanced, but 1M token per day in any case sounds slim for any user who pays for the service.

Meanwhile, I don’t use ChatGPT at all on a median day. I use it in occasional bursts when researching something.

https://openrouter.ai/deepseek/deepseek-chat-v3.1

They are dirt cheap. Same model architecture for the comparison: $0.30/M $1.00/M. Or even $0.20-$0.80 from another provider.

Another giant problem with this article is we have no idea the optimizations used on their end. There are some widly complex optimizations these large AI companies use.

What I'm trying to say is that hosting your own model is in an entierly different leauge than the pros.

If we account for error in article implies higher cost I would argue it would return back to profit directly because how advanced optimization of infer3nce has become.

If actual model intelligence is not a moat (looking likely this is true) the real sauce of profitable AI companies is advanced optimizations across the entire stack.

Openai is NEVER going to release their specialized kernels, routing algos, quanitizations or model comilation methods. These are all really hard and really specific.

I would not be surprised if the operating costs are modest

But these companies also have very expensive R&D development and large upfront costs.

https://lambda.chat

Deepseek R1 for free.

* distilled R1 for free

> I'm here to provide helpful, respectful, and appropriate content for all users. If you have any other requests or need assistance with a different type of story or topic, feel free to ask!