Hacker News

nkohari a day ago [ - ]

You're comparing apples to oranges there. Qwen 3.5 is a much larger model at 397B parameters vs. Gemma's 31B. Gemma will be better at answering simple questions and doing basic automation, and codegen won't be it's strong suit.

kgeist a day ago [ - ]

Qwen3.5 comes in various sizes (including 27B), and judging by the posts on HN, /LocalLlama etc., it seems to be better at logic/reasoning/coding/tool calling compared to Gemma 4, while Gemma 4 is better at creative writing and world knowledge (basically nothing changed from the Qwen3 vs. Gemma3 era)

Mil0dV a day ago [ - ]

Does this also apply to gemma's 26B-A4B vs say Qwens 35B-A3B?

I'm not sure if I can make the 35B-A3B work with my 32GB machine

green7ea 16 hours ago [ - ]

It should be easy with a Q4 (quantization to 4 bits per weight) and a smallish context.

You won't have much RAM left over though :-/.

At Q4, ~20 GiB

https://huggingface.co/unsloth/Qwen3.5-35B-A3B-GGUF

rhdunn 13 hours ago [ - ]

For llama-server (and possibly other similar applications) you can specify the number of GPU layers (e.g. `--n-gpu-layers`). By default this is set to run the entire model in VRAM, but you can set it to something like 64 or 32 to get it to use less VRAM. This trades speed as it will need to swap layers in and out of VRAM as it runs, but allows you to run a larger model, larger context, or additional models.

tredre3 a day ago [ - ]

Gemma 4 31B is still not impressive at coding compare to even Qwen 3.5 27B. It's just not its strong suit.

So far gemma 4 seems excellent at role playing, document analysis, and decent at making agentic decisions.

gigatexal a day ago [ - ]

This has been my experience as well, Qwen via Ollama locally has been very very impressive.