Hacker News

What I noticed when using OpenCode with llama.cpp, was that the default host RAM prompt cache size in llama.cpp was way too small for say 128k Qwen3.6 27B.

The default is just 8GB and a full 128k context for the dense model can take most of that. So then comes an agent and causes eviction and subsequent cache miss.

Bumped the cache size (--cram IIRC) up to 48GB and had much better results.