There's also the burn framework but there are a lot of tradeoffs to consider. It's neat for wgpu targets (including web) but you'll need to implement a lot of stuff.

Candle is a great choice overall (and there are plenty of examples) but performance is slightly worse compared to tch.

Personally, if I can get it done with candle that's what I do. It's also pretty neat for serverless.

If I can't, I check if I can convert it to onnx without extra work (or if there is an onnx available).

As a last resort, I think about shipping torchlib via tch.