The default context is 128k for the smaller Gemma 4’s and 256k for the bigger ones, so you’re cutting off context and it doesn’t know how to continue.
Bump it to native (or -c 0 may work too)
The default context is 128k for the smaller Gemma 4’s and 256k for the bigger ones, so you’re cutting off context and it doesn’t know how to continue.
Bump it to native (or -c 0 may work too)
In that case the model descriptor on ollama.com is incorrect, because it defaults to 16k. So I have to manually change that to 64/128k. I think you are talking about maximum context size.
No, the default context in Ollama varies by the memory available: https://docs.ollama.com/context-length