Wouldn't you deal with spatial reasoning by giving it access to a tool that structures the space in a way it can understand or just is a sub-model that can do spatial reasoning? These "general" models would serve as the frontal cortex while other models do specialized work. What is missing?
That's a bit like saying just give blind people cameras so they can see.
I mean, no not really. These models can see, you're giving them eyes to connect to that part of their brain.
They should train more on sports commentary, perhaps that could give spatial reasoning a boost.