This seems like something that would be very expensive to run. Do you have some representative figures at a particular resolution and frame rate?
This seems like something that would be very expensive to run. Do you have some representative figures at a particular resolution and frame rate?
The README on the GitHub has a section on this[0]:
>Indexing 1 hour of footage costs ~$2.84 with Gemini's embedding API (default settings: 30s chunks, 5s overlap):
>1 hour = 3,600 seconds of video = 3,600 frames processed by the model. 3,600 frames × $0.00079 = ~$2.84/hr
>The Gemini API natively extracts and tokenizes exactly 1 frame per second from uploaded video, regardless of the file's actual frame rate. The preprocessing step (which downscales chunks to 480p at 5fps via ffmpeg) is a local/bandwidth optimization — it keeps payload sizes small so API requests are fast and don't timeout — but does not change the number of frames the API processes.
[0] https://github.com/ssrajadh/sentrysearch#cost