> I want to build my own video model, just for learning purposes

Sorry, it might sound like a cliche, but try that as a prompt to a deep thinking and learning model, and see what comes out.

An expensive option: Look at Project #5 at https://bytebyteai.com/