I am in no way a tech savy person, don't know coding, don't know networking or AI much either. But I definitely want to have a system like this. An AI powered gallery / video repository that can help me find moments, people, colors, objects from 100s of 1000s of files.
Local LLMs sound so cool but I know they won't be easy to setup or use for common joe like me.
Immich can do part of this. For photos it does lm object detection and ocr for text. I think for video is currently only the first frame. It also has face / people detection.
And once set up it's easy to use even for non technical people.