Thanks for pointing this out mdaniel & woadwarrior01 — reducing the footprint is definitely on my radar and something I’m actively working on. I actually started with CoreML but switched to ONNX after running into some issues.

That said, it’s kind of amazing that we can run models of ~90 MB this efficiently on our devices today — the performance has been really encouraging. Appreciate the feedback!