The on-device models used for Apple Intelligence (writing tools, notification and email/message summaries, etc.) are around ~3B parameters.

I mean, they could be better (to put it nicely), but there is a legitimate use-case for them and I'd love to see more work in this space.

https://machinelearning.apple.com/research/introducing-apple...

https://arxiv.org/abs/2407.21075