For photo indexing I'd run CLIP directly and save on compute, no need to use a whole language model.