Obviously, there’s a limit to how much you can squeeze into a single parameter. I guess the low-hanging fruit will be picked up soon, and scaling will continue with algorithmic improvements in training, like [1], to keep the training compute feasible.

I take "you can't have human-level intelligence without roughly the same number of parameters (hundreds of trillions)" as a null hypothesis: true until proven otherwise.

[1] https://arxiv.org/html/2602.15322v1