The reason text works is because it has higher bit rate then speech. This is way many believe that CLI tools are still considered supreme in terms of getting things done quick.

While fun this game-like interface is too casual and it certainly has lower bit rate which impacts communicate exchange between an AI and the human operator.

It will be a fine abstraction if the goal is to have high-level overview though.

Thanks for the comment! We're working towards using the game's own simulation data (from Unity) to feed back into your game's agents. We hope this will prove less noisy than speech / real-world instrument data, allowing the AI to learn more effectively with new data every time you play