Optimal learning is an interesting problem in computer science because it is fundamentally bound by geometric space complexity rather than computational complexity. You can bend the curve but the approximations degrade rapidly and still have a prohibitively expensive exponential space complexity. We have literature for this; a lot of the algorithmic information theory work in AI was about characterizing these limits.

The annoying property of prohibitively exponential (ignoring geometric) space complexity is that it places a severe bound on computational complexity per unit time. The exponentially increasing space implies an increase in latency for each sequentially dependent operation, bounded at the limit by the speed of light. Even if you can afford the insane space requirements, your computation can’t afford the aggregate latency for anything useful even for the most trivial problems. With highly parallel architectures this can be turned into a latency-hiding problem to some extent but this also has limits.

This was thoroughly studied by the US defense community decades ago.

The tl;dr is that efficient learning scales extremely poorly, more poorly than I think people intuit. All of the super-intelligence hard-takeoff scenarios? Not going to happen, you can’t make the physics work without positing magic that circumvents the reality of latencies when your state space is unfathomably large even with unimaginably efficient computers.

I harbor a suspicion that the cost of this scaling problem, and the limitations of wetware, has bounded intelligence in biological systems. We can probably do better in silicon than wetware in some important ways but there is not enough intrinsic parallelism in the computation to adequately hide the latency.

Personally, I find these “fundamental limits of computation” things to be extremely fascinating.

So I studied Machine Learning too. One of the main things I learned is that for any problem there is an ideally sized model that when trained will produce the lowest error rate. Now, when you do multi-class learning (training a model for multiple problems), that ideally sized model is larger but there is still an optimum sized model. Seems to me that for GAI, there will also be an ideally sized model. I wouldn't be surprised if the complexity of that model was very similar to the size of the human brain. If that is the case, then some sort of super-intelligence isn't possible in any meaningful way. This would seem to track with what we are seeing in the today's LLMs. When they build bigger models, they often don't perform as well as the previous one which perhaps was at some maximum/ideal complexity. I suspect, we will continue to run into this barrier over and over again.

> for any problem there is an ideally sized model that when trained will produce the lowest error rate.

You studied ML before discovery of "double descent"?

https://youtu.be/z64a7USuGX0

I did, however I have also observed ideal sized models since then in algorithms designed with knowledge of it.

Any reference material (papers/textbooks) on that topic? It does sound fun.