Hacker News

new | ask | show | jobs

0xc133 4 hours ago [ - ]

With yarn and rope scaling arguments for llama.cpp you could run qwen3.6-27B with 1M context… if you have enough memory to store it.