It depends a bit on the task and language, but my go-to is usually minishlab/potion-base-8M for every task except retrieval (classification, clustering, etc). For retrieval minishlab/potion-retrieval-32M works best. If performance is critical minishlab/potion-base-32M is best, although it's a bit bigger (~100mb).

There's definitely a quality trade-off. We have extensive benchmarks here: https://github.com/MinishLab/model2vec/blob/main/results/REA.... potion-base-32M reaches ~92% of the performance of MiniLM while being much faster (about 70x faster on CPU). It depends a bit on your constraints: if you have limited hardware and very high throughput, these models will allow you to still make decent quality embeddings, but ofcourse an attention based model will be better, but more expensive.

Thanks man this is incredible work, really appreciate the details you went into.

I've been chewing on if there was a miracle that could make embeddings 10x faster for my search app that uses minilmv3, sounds like there is :) I never would have dreamed. I'll definitely be trying potion-base in my library for Flutter x ONNX.

EDIT: I was thanking you for thorough benchmarking, then it dawned on me you were on the team that built the model - fantastic work, I can't wait to try this. And you already have ONNX!

EDIT2: Craziest demo I've seen in a while. I'm seeing 23x faster, after 10 minutes of work.

Thanks so much for the kind words, that's awesome to hear! If you have any ideas or requests, don't hesitate to reach out!