How do you run CLIP in Rust? I'm still not sure what the best inference stack on Rust is --- I tried onnxruntime bindings in Rust a few years ago and it was rather finicky to work with.
How do you run CLIP in Rust? I'm still not sure what the best inference stack on Rust is --- I tried onnxruntime bindings in Rust a few years ago and it was rather finicky to work with.
We're using the onnxruntime bindgs with the Ort crate. Our biggest challenge was just bundling the onnxruntime into the app and making sure everything was signed, etc.
I heard about burn [1] and candle [2] recently and they sounded interesting.
[1] https://crates.io/crates/burn
[2] https://github.com/huggingface/candle
Interestingly, burn supports candle as a backend.
Nice. I'll try to test these out against our current implementation.
Tauri, from my experience, can be extremely frustrating.
Yeah there are definitely hang ups along the way. It's a big of wack-a-model but thankfully we've had an easier time developing with Rust than on Electron.
> I'm still not sure what the best inference stack on Rust is
I was just looking into this today!
The options I've found, but yet to evaluate:
- TorchScript + tch = Use `torch.jit.trace` to create a traced model, load with tch/rust-tokenizers
- rust-bert + tch = Seems to provide slightly higher-level usage, also use traced model
- ONNX Runtime - Convert (via transformers.onnx) .pt model to .onnx encoder and decoder, then use onnxruntime+ndarray for inference
- Candle crate - Seems to have the smallest API for basic inference, and AFAIK can load up models saved with model.save() without conversion or other things
These are the different approaches I've found so far, but probably missed a bunch. All of them seem OK, but on different abstraction-levels obviously, so depends on what you want ultimately. If anyone know any other approach, would be more than happy to hear about it!
There's also the burn framework but there are a lot of tradeoffs to consider. It's neat for wgpu targets (including web) but you'll need to implement a lot of stuff.
Candle is a great choice overall (and there are plenty of examples) but performance is slightly worse compared to tch.
Personally, if I can get it done with candle that's what I do. It's also pretty neat for serverless.
If I can't, I check if I can convert it to onnx without extra work (or if there is an onnx available).
As a last resort, I think about shipping torchlib via tch.
Great resources, thanks. I'll look into the other packages and compare against our onnx runtime setup.