Here's a recent paper showing that models trained to generate videos develop strong geometric representations and understanding:
Here's a recent paper showing that models trained to generate videos develop strong geometric representations and understanding: