I mean, Google already has Mu Zero, which Im willing to bet has evolved quite a bit in private because if anything is going to get us closer to actual AI its that.
Realistically, one can build a AI capable of reasoning (i.e recurrent loops with branches) using very basic models that fit on a 3090, with multi agent configuration along the lines https://github.com/gastownhall/gastown. Nobody has done it yet because we don't know what the number of agents is required and what the prompts for those look like.
The fundamental philosophical problem is if that configuration is possible to arrive at using training, or do ai agents have to go through equivalent "evolution epocs" to be able to do all that in a simulated environment. Because in the case of those prompts and models, they have to be information agnostic.