I'll add docs! Tl;DR: in the onboarding (or in the Add Model menu section), you can select adding a custom LLM. It'll ask you for your API base URL, which is whatever localhost+port setup you're using, and then an env var to use as an API credential. Just put in any non-empty credential, since local models typically don't actually use authentication. Then you're good to go.

IMO gpt-oss-120b is actually a very competent local coding agent — and it should fit on your 128GB Macbook Pro. I've used it while testing Octo actually, it's quite good for a local model. The best open model in my opinion is zai-org/GLM-4.5, but it probably won't fit on your machine (although it works well with APIs — my tip is to avoid OpenRouter though since quite a few of the round-robin hosts have broken implementations.)

Ok wonderful! Thanks.

I'm trying to set it up right now with lmstudio with qwen3-coder-30b. Hopefully it's going to work. Happy to take any pointers on anything y'all have tried that seemed particularly promising.

For sure! We also have a Discord server if you need any help: https://discord.gg/syntheticlab

Follow up question, can the diff apply and fix json models be run locally as well with octofriend, or do they have to hit your servers? Thanks!

They're just Llama 3.1 8b Instruct LoRAs, so yes — you can run them locally! Probably the easiest way is to merge the weights, since AFAIK ollama and llama.cpp don't support LoRAs directly — although llama.cpp has utilities for doing the merge. In the settings menu or the config file you should be able to set up any API base URL + env var credential for the autofix models, just like any other model, which allows you to point to your local server :)

The weights are here:

https://huggingface.co/syntheticlab/diff-apply

https://huggingface.co/syntheticlab/fix-json

And if you're curious about how they're trained (or want to train your own), the entire training pipeline is in the Octofriend repo.

I think this might be your best bet right now. GLM-4.5-Air is probably next best. I'd run them at 8-bit using MLX.