This is awesome. I love emacs and I love integrating AI into my coding work flow.

What I really want is to be able to run something like this locally for, say, less than $2000 in computer hardware. Is this feasible now or any time soon. Anyone out there using agents with local models for coding?

There's a lot of great work both around supporting memory efficient inference (like on a closer-to-consumer machine), as well as on open source code-focused models.

A lot of people are excited about the Qwen3-Coder family of models: https://huggingface.co/collections/Qwen/qwen3-coder-687fc861...

For running locally, there are tools like Ollama and LM Studio. Your hardware needs will fluctuate depending on what size/quantization of model you try to run, but 2k in hardware cost is reasonable for running a lot of models. Some people have good experiences using the M-series Macs, which is probably a good bang-for-buck if you're exclusively interested in inference.

I'd recommend checking out the LocalLlamas subreddit for more: https://www.reddit.com/r/LocalLLaMA/

Getting results on par with big labs isn't feasible, but if you prefer to run everything locally, it is a fun and doable project.

Awesome. Great info, thanks

Is this just a fun project for now, or could I actually benefit from it in terms of software production like I do with tools like claude code?

I am interested in carefully tailoring it to specific projects, integrating curated personal notes, external documentation, scientific papers, etc via RAG (this part I've already written), and carefully chosing the tools available to the agent. If I hand tailor the AI agents to each project, can I expect to get something perhaps similar to the performance boost of Claude code for $2000 (USD)?

If not $2000, then how much would I need? I'm pretty sure for something like $75000 I could do this with a large deep seek model locally, and certainly get something very close to claude code, right?

1. MacMini.

or:

2. https://frame.work/es/en/desktop

3. https://marketplace.nvidia.com/en-us/developer/dgx-spark/ => https://marketplace.nvidia.com/en-us/reservations/ (cheapest is 3k)

At Computex here they had a demo running some local model on a cluster of 4 framework desktops. It certainly generated text! Just about one character of it a second.

gptel supports all sorts of models, including localized ones.