Hacker News

binarymax 3 days ago [ - ]

Looks awesome. I work closely with Multimodal search and have had trouble porting CLIP to ONNX and other formats due to the lack of multi-head attention operators. Are you using Python for the CLIP inference, or did you manage to port it to a format hostable in a Rust or C/C++ inference runtime?

katrinarodri 3 days ago [ - ]

Yes, we were able to port the CLIP model to work with ONNX Runtime for inference

binarymax 3 days ago [ - ]

May I ask, which version or ORT are you using? Were the outputs identical to PyTorch outputs for the same image?