Hacker News

I’ve jumped over to oMLX. A ton of rough edges but I think it’s the future.

Have you been using `omlx serve`? If so, how are you bumping up the max context size? I'm not seeing a param to go above 32k?

vlowther 5 hours ago [ - ]

Same. Opencode + oMLX (0.3.4) + unsloth-Qwen3-Coder-Next-mlx-8bit on my M5 Max w 128GB is the sweet spot for me locally. The prompt decode caching keeps things coherent and fast even when contexts get north of 100k tokens.