Gemma4 works really slow on my android e2b model on Samsung galaxy s21 ultra. Atleast 20-30 sec to warm up and then reply.

Running LLMs is probably the first time I find that the SoC of that generation to lack. Even Google's underpowered Tensor CPUs make a huge difference when it comes to LLM performance.

You can check your settings for GPU acceleration, it's possible that enabling that makes a big difference.

From what I've found online the difference may also simply be Snapdragon versus Exynos GPU driver optimizations, in which case I don't think the performance can be fixed by anyone but Samsung. Others online seem to get decent performance out of the model on the S21 Ultra at the very least.

Needs a modern phone, local LLMs don't work well on older phones.

The bigger E4B model is pretty fast on my Galaxy S21 Ultra even with thinking enabled. Maybe GPU acceleration was not enabled?

I think there's quite the performance difference between the S21 Ultra (Snapdragon 888) and the S21 Ultra (Exynos 2100).

Qualcomm has optimized libraries for running LLMs on their chips that I don't believe Samsung has bothered with.

need s24 ultra and above i think