The relevant constraint when running on a phone is power, not really RAM footprint. Running the tiny E2B/E4B models makes sense, this is essentially what they're designed for.

Depends on the phone, I have trouble fitting models into memory on my iPhone 13 before iOS kills the app. I imagine newer phones with more RAM don’t have this issue especially with some new flagship phones having 16+ GB of memory

It absolutely is RAM…

So much so that this was what made Apple increase their base sizes.

Between the GPU, NPU and big.LITTLE cores, many phones have no fewer than 4 different power profiles they can run inference at. It's about as solved as it will get without an architectural overhaul.