Hacker News

ankit219 4 days ago [ - ]

This seems very very far off. From the latest reports, anthropic has a gross margin of 60%. It came out in their latest fundraising story. From that one The Information report, it estimated OpenAI's GM to be 50% including free users. These are gross margins so any amortization or model training cost would likely come after this.

Then, today almost every lab uses methods like speculative decoding and caching which reduce the cost and speed up things significantly.

The input numbers are far off. The assumption is 37B of active parameters. Sonnet 4 is supposedly a 100B-200B param model. Opus is about 2T params. Both of them (even if we assume MoE) wont have exactly these number of output params. Then there is a cost to hosting and activating params at inference time. (the article kind of assumes it would be the same constant 37B params).

mutkach 4 days ago [ - ]

Gross margins also don't tell the whole story, we don't know how much Azure and Amazon charge for the infrastructure and we have reasons to believe they are selling it at a massive discount (Microsoft definitely does that, as follows from their agreement with OpenAI). They get the model, OpenAI gets discounted infra.

ankit219 4 days ago [ - ]

A discounted Azure H100 will still be more than $2 per hour. Same goes for AWS. Trainium chips are new and not as effective (not saying they are bad) but still cost in the same range.

For inference, gross margins are exactly: (what companies charge per 1M tokens to the user) - (direct cost to produce that 1M tokens which is GPU costs).

mutkach 4 days ago [ - ]

I am implying that what OpenAI pays for GPU/hour is much less than $2, because of the discount. That's an assumption. It could be $1, $0.5, no?

It could still be burning money for Microsoft/Amazon

thegeomaster 4 days ago [ - ]

Are you saying that you think Sonnet 4 has 100B-200B _active_ params? And that Opus has 2T active? What data are you basing these outlandish assumptions on?

ankit219 4 days ago [ - ]

Oh nothing official. There are people who estimate the sizes based on tok/s, cost, benchmarks etc. The one that most go on is https://lifearchitect.substack.com/p/the-memo-special-editio.... This guy estimated Claude 3 opus to be 2T param model (given the pricing + speed). Opus 4 is 1.2T param according to him (but then I dont understand why the price remained the same.). Sonnet is estimated by various people to be around 100B-200B params.

[1]: https://docs.google.com/spreadsheets/d/1kc262HZSMAWI6FVsh0zJ...

NoahZuniga 4 days ago [ - ]

If you're using the api cost of the model to estimate it's size, then you can't use this size estimate to estimate the inference cost.

thegeomaster 4 days ago [ - ]

tok/s cannot in any way be used to estimate parameters. It's a tradeoff made at inference time. You can adjust your batch size to serve 1 user at a huge tok/s or many users at a slow tok/s.

Der_Einzige 4 days ago [ - ]

Not everyone uses MoE architectures. It's not outlandish at all...

thegeomaster 4 days ago [ - ]

There's no way Sonnet 4 or Opus 4 are dense models.

Der_Einzige 4 days ago [ - ]

Citation needed

thegeomaster 4 days ago [ - ]

Common sense:

- The compute requirements would be massive compared to the rest of the industry

- Not a single large open source lab has trained anything over 32B dense in the recent past

- There is considerable crosstalk between researchers at large labs; notice how all of them seem to be going in similar directions all the time. If dense models of this size actually provided benefit compared to MoE, the info would've spread like wildfire.