What model do you recommend for image search?

Not OP, but CLIP from OpenAi (2021) seems pretty standard and gives great results at least in English (not so good in rarer languages).

https://opencv.org/blog/clip/

Essentially CLIP lets to encode both text and images in same vector space.

It is really easy and pretty fast too generate embeddings. Took less than hour on Google Colab.

I made a quick and dirty Flask app that lets me query my own collection of pictures and provide most relevant ones via cosine similarity.

You can query pretty much anything on CLIP (metaphors, lightning, object, time, location etc).

From what I understand many photo apps offer CLIP embedding search these days including Immich - https://meichthys.github.io/foss_photo_libraries/

Alternatives could be something like BLIP.

This is what I use:

ViT-SO400M-16-SigLIP2-384__webli

I think I found it because it was recommended by Immich as the best, but it still only took a day or two to run against my 5 thousand assets. I’ve tested it against whatever Google is using (I keep a part of my library on Google Photos), and it’s far better.