Well, that assumes that if you just keep throwing more data and compute at large language models you'll end up with something akin to AGI to control those robots. Which is far from guaranteed.

LLMs already solved the "System 2" part of this, to borrow from Kahneman, it's the "System 1" part that's lagging behind here. Current Claude/Gemini/ChatGPT is more than enough to tell a robot what chores to do, what to do with a thing, how, where to put it, etc. but what's still missing is the ability to reliably translate those goals to movements of a robot in diverse and tight environment that is a typical house or apartment, with any kind of reliability and safety.

No, you're assuming that you need AGI to control a robot, when LLMs have already shown you don't need anything close to hold a conversation.

So why do you suddenly think you need it for controlling a body when animals do it with far less?

LLMs aren't the specific architecture you'd use, but it very much looks like a tractable engineering problem to go from a university research lab project that can manage to fold clothes as a demo, to a sellable consumer product. The timeline is gonna be off, so no one knows if it's gonna take 3 years or 30, but it's not going to take an unknown breakthrough in materials science and physics the same way that nuclear fusion looks like it will require.