It looks like the purpose of this model is to i. generate environmental sim data for doing RL on other models or ii. act as a foundation model (they trained it to select actions as well as predicting the next state in the same loop?)
Either way, neither are intended for end consumers.