Hacker News

With M2, yes - I’ve used it in Claude Code (e.g. native tool calling), Roo/Cline (e.g. custom tool parsing), etc. It’s quite good and for some time the best model to self-host. At 4bit it can fit on 2x RTX 6000 Pro (e.g. ~200GB VRAM) with about 400k context at fp8 kv cache. It’s very fast due to low active params, stable at long context, quite capable in any agent harness (its training specialty). M2.1 should be a good bump beyond M2, which was undertrained relative to even much smaller models.