Hacker News

I’d like to see embedding of actual video clips become practical in this type of workflow.

Frame level embedding it covering a lot, but can miss out on a lot of action related searches.

Sure, I'm using (https://huggingface.co/collections/Qwen/qwen25-vl) which can help me understand action like falling down, because I can provide for example 5 frames (down scaled to 720p) to understand what is happening in this part of the video