The inference process via ONNX (where most of the CPU time is spent) is multi-threaded. I briefly explored adding multiprocess capabilities for the tiling/downsampling/result merging parts of the codebase, but the improvements were marginal with a trivial implementation, so I didn't explore further (although I'm sure it could be improved).
Great, thank you!