Hacker News

We've already hit RAM power and size limits (about 40k electrons which is the limit before we get noise messing up the amplifier).

If a model needs 2x more memory, but serves the same number of customers, the cost is going to go up per customer to cover the increased hardware and power costs. Companies are starting to implement AI limits to keep costs under control.

Anthropic and OpenAI are rumored to be considering cutting inference prices trying to retain customers as LLMs commoditize and race to the bottom. It reminds me of the Chinese bike wars where bike-share companies were losing massive amounts of money, but kept running sales and lowering prices in an attempt to compete and drive out their competitors. The end of that story was a bunch of major bankruptcies and giant bike graveyards.

Nvidia's hard pivot to "in the near future, everyone will run their AI at home" seems to indicate that they also see the market shifting. We've already had AI ingest everything out there. The real challenge becomes how to better optimize their algorithm to get more useful data in less space.