Hacker News

peder 9 hours ago [ - ]

Did you have any Anthropic vs OpenAI specification issues with Claude Code? I have been using mlx_vlm and vMLX and I get 400 Bad Request errors from Claude Code. Presumably you're not seeing those issues with llama-server ?

d4rkp4ttern 7 hours ago [ - ]

Correct, no issues because since at least a few months, llama.cpp/server exposes an Anthropic messages API at v1/messages, in addition to the OpenAI-compatible API at v1/chat/completions. Claude Code uses the former.

selectodude 8 hours ago [ - ]

I’ve jumped over to oMLX. A ton of rough edges but I think it’s the future.

peder an hour ago [ - ]

Have you been using `omlx serve`? If so, how are you bumping up the max context size? I'm not seeing a param to go above 32k?

vlowther 5 hours ago [ - ]

Same. Opencode + oMLX (0.3.4) + unsloth-Qwen3-Coder-Next-mlx-8bit on my M5 Max w 128GB is the sweet spot for me locally. The prompt decode caching keeps things coherent and fast even when contexts get north of 100k tokens.