Pretty terribly expensive way to watch a video with Claude.

Use Gemini or some local VLM to do this way more efficiently. We spent quite a bit of time on video understanding, and Claude will just burn tokens.

Check out this library: https://vlm-run.github.io/mm/

You can swap models and try out different encoding methods for videos (https://vlm-run.github.io/mm/encoders/#video)

Exactly this. Gemini is best at this. Just give it video link - YouTube works best - and it will analyse the video.

Really, does this work now? What about NotebookLM? I was using it a lot until i realised it was only analysing the transcripts and not the video because i was mostly using it for technical ones with important charts.

It can tell you what’s on the screen at given point in time. My pipeline is mostly around simple questions like “does this video contain cars?” Not sure if it can spot charts on screen.

NotebookLM still uses the transcript method I think. But Gemini is wonderful. I have been using it to analyze the youtube videos of wrestling matches (trying to build a fan website for WXM, the best pro wrestling promotion to come out of India in a while). It does move by move analysis, audience reaction based match flow tracking, isolates interesting parts of the video (big moves, botches, story beats etc). I have run some experiments to get video editing plans out of it. I think I can combine it with something like remotion skill to make highlight videos.

Edit: BTW, you can analyze about 8 hours a day on free tier.

Seems cool from the docs page, I was about to give it a shot but https://github.com/vlm-run/mm goes 404 …

It’s unclear if that’s intentional since it’s listed also under open source on the main company site: https://www.vlm.run/open-source/mm

Do you mean that Gemini is most token-efficent at watching videos? Is that the case for e.g. just giving it a video in the browser? I admit, I dont give LLMs videos as I just assume it'll burn too many tokens.

Yes, Gemini is very token efficient at video. It also has "lower resolution" options which can make it even cheaper if. With Gemini 3.1 flash lite an hour of video works out to $0.24 at the API rates.

Assuming that's your project, the GitHub link from the PyPi page is a 404.