why not skip the text conversion? is it usable at all?
gemini embedding 2 converts straight video to vectors. in this case, dashcam clips don't have audio to transcribe and even if they did, it would be useless in the search
What are the SoA audio models right now?
gemini embedding 2 converts straight video to vectors. in this case, dashcam clips don't have audio to transcribe and even if they did, it would be useless in the search
What are the SoA audio models right now?