Yes, we were able to port the CLIP model to work with ONNX Runtime for inference

May I ask, which version or ORT are you using? Were the outputs identical to PyTorch outputs for the same image?