I'm fascinated by how the economy is catching up to demand for inference. The vast majority of today's capacity comes from silicon that merely happens to be good at inference, and it's clear that there's a lot of room for innovation when you design silicon for inference from the ground up.

With CapEx going crazy, I wonder where costs will stabilize and what OpEx will look like once these initial investments are paid back (or go bust). The common consensus seems to be that there will be a rug pull and frontier model inference costs will spike, but I'm not entirely convinced.

I suspect it largely comes down to how much more efficient custom silicon is compared to GPUs, as well as how accurately the supply chain is able to predict future demand relative to future efficiency gains. To me, it is not at all obvious what will happen. I don't see any reason why a rug pull is any more or less likely than today's supply chain over-estimating tomorrow's capacity needs, and creating a hardware (and maybe energy) surplus in 5-10 years.