Autoencoders should output these kinds of splats instead of pixel outputs and likely obtain better representations of the world at the bottleneck. These features can be used for downstream tasks.