Unfortunately the feedback loop for robotics is many many orders of magnitude slower than math / coding problems. And when you get to artificial environments, you are learning artificial dynamics -- same limitations as the benchmarks.

Moravec's paradox