Nice! Tried on iPhone 16 pro with 30 TPS from Gemma-4-E2B-it model.

Although the phone got considerably hot while inferencing. It’s quite an impressive performance and cannot wait to try it myself in one of my personal apps.

It's at least somewhat limited in non-English content. It knows how to make lentil soup, so I was happy that I never need to look up recipe sites with awful UX and ads, but then it couldn't find a recipe for "Kalter Hund"/"Kalte Schnauze". So sad ;)

Still, absolutely fabulous. What a time to be alive!

It's strange that my iPhone 14 is at regular temperature when using the E2B model. But also it's a lot slower (not sure how to measure the exact tokens per second, ~12 if I had to guess)