There's a lot of great work both around supporting memory efficient inference (like on a closer-to-consumer machine), as well as on open source code-focused models.

A lot of people are excited about the Qwen3-Coder family of models: https://huggingface.co/collections/Qwen/qwen3-coder-687fc861...

For running locally, there are tools like Ollama and LM Studio. Your hardware needs will fluctuate depending on what size/quantization of model you try to run, but 2k in hardware cost is reasonable for running a lot of models. Some people have good experiences using the M-series Macs, which is probably a good bang-for-buck if you're exclusively interested in inference.

I'd recommend checking out the LocalLlamas subreddit for more: https://www.reddit.com/r/LocalLLaMA/

Getting results on par with big labs isn't feasible, but if you prefer to run everything locally, it is a fun and doable project.

Awesome. Great info, thanks

Is this just a fun project for now, or could I actually benefit from it in terms of software production like I do with tools like claude code?

I am interested in carefully tailoring it to specific projects, integrating curated personal notes, external documentation, scientific papers, etc via RAG (this part I've already written), and carefully chosing the tools available to the agent. If I hand tailor the AI agents to each project, can I expect to get something perhaps similar to the performance boost of Claude code for $2000 (USD)?

If not $2000, then how much would I need? I'm pretty sure for something like $75000 I could do this with a large deep seek model locally, and certainly get something very close to claude code, right?