NotebookLM still uses the transcript method I think. But Gemini is wonderful. I have been using it to analyze the youtube videos of wrestling matches (trying to build a fan website for WXM, the best pro wrestling promotion to come out of India in a while). It does move by move analysis, audience reaction based match flow tracking, isolates interesting parts of the video (big moves, botches, story beats etc). I have run some experiments to get video editing plans out of it. I think I can combine it with something like remotion skill to make highlight videos.
Edit: BTW, you can analyze about 8 hours a day on free tier.