There's also the burn framework but there are a lot of tradeoffs to consider. It's neat for wgpu targets (including web) but you'll need to implement a lot of stuff.
Candle is a great choice overall (and there are plenty of examples) but performance is slightly worse compared to tch.
Personally, if I can get it done with candle that's what I do. It's also pretty neat for serverless.
If I can't, I check if I can convert it to onnx without extra work (or if there is an onnx available).
As a last resort, I think about shipping torchlib via tch.