Great points! We do keep seeing gains from larger model sizes. I think that is still one of the factors contributing to jagged intelligence. When they increase up to around 100T parameters, that will truly be human complexity level, and I assume there will be no trace of jaggedness left.
If you look at things like Mythic AI and the recent wurtzite ferroelectric nitrides breakthrough from the University of Michigan, huge performance and efficiency gains through new compute-in-memory approaches are around the corner.
And that will get us up to two orders of magnitude more parameters.
It's also plausible to me that before we get all the way to 100T we find some recipe of efficient state synchronization, goal sharing or something so that we are able to get higher collective IQ by combining fast distributed predictive subnetworks.