> The problem is going to be how to control those models to produce a universe that's temporally and spatially consistent.
Why not just have a simple, low-poly rasterizer and have AI fill in the details?
That's essentially the way that AMD FX and NVIDIA DLSS work today, although they do take fully rendered frames as input.