Not aware of any that do native video-to-vector embedding the way Gemini Embedding 2 does. There are CLIP-based models (like VideoCLIP) that embed frames individually, but they don't process temporal video. you'd need to average frame embeddings which loses a lot.
Would love to see open-weight models with this capability since it would eliminate the API cost and the privacy concern of uploading footage.
Not aware of any that do native video-to-vector embedding the way Gemini Embedding 2 does. There are CLIP-based models (like VideoCLIP) that embed frames individually, but they don't process temporal video. you'd need to average frame embeddings which loses a lot.
Would love to see open-weight models with this capability since it would eliminate the API cost and the privacy concern of uploading footage.
A quick search brought up https://qwen.ai/blog?id=qwen3-vl-embedding but I have no idea if it does what Gemini is doing here.
more or less works similarly, made a proof of concept for it: https://github.com/jakejimenez/sentinelsearch
Very cool, thanks. Will check it out.
[dead]