It doesn’t render Markdown or LaTeX. The scrolling is unusable during generation. E4B failed to correctly account for convection and conduction when reasoning about the effects of thermal radiation (31b was very good). After 3 questions in a session (with thinking) E4B went off the rails and started emitting nonsense fragment before the stated token limit was hit (unless it isn’t actually checking).

They have very limited capabilities compared to bigger more complex models, but for general stuff, they are fantastic. We need to set the expectations correctly of what they can do, I know lots of hype around Gemma 4, even though Qwen3.5 outperformed it. It's just a reliable overall small model, with great small model abilities.