This looks extremely impressive, really deserves more attention here.

Are the inverse dynamics and forward dynamics models trained separately? It sounds like if the inverse dynamics model is meant to extrapolate more training data, then perhaps all that means is it takes very little data to generalize directly with the forward dynamics model assuming the right architecture.

thanks! the inverse dynamics model is trained first on 40k hours of data and then frozen to label all 11 million hours. yup! the idea is that it should take a small amount of data to generalize environment dynamics, then you can use a lot of data to understand actions.