I haven't seen anybody else post it in this thread, but this is running on 8GB of RAM. It's not the full Gemma 4 32B model. It's a completely different thing from the full Gemma 4 experience if you were running the flagship model, almost to the point of being misleading.

It's their E2B and E4B variants (so 2B and 4B but also quantized)

https://ai.google.dev/gemma/docs/core/model_card_4#dense_mod...

The relevant constraint when running on a phone is power, not really RAM footprint. Running the tiny E2B/E4B models makes sense, this is essentially what they're designed for.

Depends on the phone, I have trouble fitting models into memory on my iPhone 13 before iOS kills the app. I imagine newer phones with more RAM don’t have this issue especially with some new flagship phones having 16+ GB of memory

It absolutely is RAM…

So much so that this was what made Apple increase their base sizes.

Between the GPU, NPU and big.LITTLE cores, many phones have no fewer than 4 different power profiles they can run inference at. It's about as solved as it will get without an architectural overhaul.