In the past I've usually found that Gemini (pro, flash) would get stuck on a problem and then seemingly start to do some kind of random search trying this and that just burning through tokens. When this would happen I'd switch (in antigravity) to Claude sonnet 4.6 and it would cut right to the chase and find the problem quickly. But the other day I was out of Claude tokens so I went back to Gemini 3.1 Pro and asked about a verilog simulation problem that Claude had been stuck on - and it figured it out in a few minutes.

Pardon my lack of depth on TFA here but in my experience with work, Gemini is far less accurate on queries about technical commands that Claude or OpenAI. Like, I don't trust it at all. Maybe it has its place but not as a general advisor.

I think what you’re seeing here is a difference in the amount of “world knowledge “ encoded in the perceptron parts of the model as opposed to how good the model is at the “transformer” part which you could think of as pure token prediction using only what’s in the context window.

If true that would suggest gemini/gemma would be great in a RAG situation where world model isn’t needed as it’s being spoonfed all the relevant information and less good at green field tasks.

That’s interesting to me because I have been struggling to understand how gemma4 is so good in my local use and how notebookLM does such a great job does when I give it project docs and yet gemini has always seemed behind claude when I use it cold for stuff.