That's the part that people are missing: it won't get smaller. It already required heroic optimization to get 8B on one megachip. Taalas is more expensive but faster. It is cheaper per token when running 24x7 but not cheap to buy. It will never be small and never be cheap.

"It will never be small and never be cheap."

Will your comment age well? We'll see.

We might all be surprised if (somehow, ternary logic?) models come down drastically in size. It doesn't have to be the hardware getting more dense.