It's pretty simple... the word circle and what you can correlate to it via english language description has somewhat less to do with reality than a physical 3D model of a circle and what it would do in an environment. You can't just add more linguistic description via training data to change that. It doesn't really matter that you can keep back propagating because what you are back propagating over is fundamentally and qualitatively less rich.