Cool project!

I see you saved a spot to show how to use it with an alternative embedding model. It would be nice to be able to use the library without an OpenAI api key. Might even make sense to vendor a basic open source model in your package so it can work out of the box without remote dependencies.

Yes, I'm planning out-of-the-box support for nomic[1] which can run in-process, and ollama which runs as a local server and supports many free embedding models[2].

[1]: https://www.nomic.ai/blog/posts/nomic-embed-text-v1

[2]: https://ollama.com/search?c=embedding

Project is super cool.

If you're adding more LLM integration, a cool feature might be sending the results of allow_many="left" off to an LLM completions API that supports structured outputs. Eg imagine N_left=1e5 and N_right=1e5 but they are different datasets. You could use jellyjoin to identify the top ~5 candidates in right for each left, reducing candidate matches from 1e10 to 5e5. Then you ship the 5e5 off to an LLM for final scoring/matching.