On the ONNX choice: it's fairly lightweight to install and runs decently fast on a CPU. Other existing libraries forced me to install torch or tensorflow.

Awesome, thank you.

Does the library inherently handle threading and concurrency to make use of beefier CPUs, or does that need to be coded in to the users Python script? I'm asking as I'll most likely have a go at extending a current project with a custom model or two and am tossing up where to put money in PC hardware. I suspect I'll need to invest both in GPU and CPU, the former for training, and the latter because not everything runs on GPU.

The inference process via ONNX (where most of the CPU time is spent) is multi-threaded. I briefly explored adding multiprocess capabilities for the tiling/downsampling/result merging parts of the codebase, but the improvements were marginal with a trivial implementation, so I didn't explore further (although I'm sure it could be improved).

Great, thank you!