Hacker News

Simulation is the answer. You just need a model that's decent at economics to independently judge the outcome, unless the model itself is smart enough. Then it becomes a self-reinforcing training environment.

Now, depending on how good your simulation is, it may or may not be useful, but still, that's how you do it. Something like https://en.wikipedia.org/wiki/MuZero