From what I see the code downsamples video to 5 fps, so 1 hour of video is 3600 seconds * 5 fps = 18,000 frames. 18,000 frames * $0.00079/frame = $14.22. A couple dollars more with the overlap.
(The code also tries to skip "still" frames, but if your video is dynamic you're looking at the cost above.)
you're right that the code uses ffmpeg to downsample the chunks to 5fps before sending them, but that's only a local/bandwidth optimization, not what the api actually processes.
regardless of the file's frame rate, the gemini api natively extracts and tokenizes exactly 1 fps. the 5 fps downscaling just keeps the payload sizes small so the api requests are fast and don't timeout.
i'll update the readme to make this more clear. thanks for bringing this up.
Thanks for the details and correction.