Yale has an amazing one, worth looking at: https://lux.collections.yale.edu/

Is that a multi-modal search? Or just textual matching?

I couldn't find any examples that couldn't be explained by simple text matches.