> I'd love to hook my development tools into a fully-local LLM.

Karpathy said in his recent talk, on the topic of AI developer-assistants: don't bother with less capable models.

So ... using an rpi is probably not what you want.

I’m having a lot of fun using less capable versions of models on my local PC, integrated as a code assistant. There still is real value there, but especially room for improvements. I envision us all running specialized lightweight LLMs locally/on-device at some point.

I'd love to hear more about what you're running, and on what hardware. Also, what is your use case? Thanks!

So I am running Ollama on Windows using an 10700k and 3080ti. I'm using models like Qwen3-coder (4/8b) and 2.5-coder 15b, Llama 3 instruct, etc. These models are very fast on my machine (~25-100 tokens per second depending on model)

My use case is custom software that I build and host that leverages LLMs for example for domotica where I use my Apple watch shortcuts to issue commands. I also created a VS2022 extension called Bropilot to replace Copilot with my locally hosted LLMs. Currently looking at fine tuning these type of models for work where I work in finance as a senior dev

Thank you. I'll take a look at Bropilot when I get set up locally.

Have a great week.

> Karpathy said in his recent talk, on the topic of AI developer-assistants: don't bother with less capable models.

Interesting because he also said the future is small "cognitive core" models:

> a few billion param model that maximally sacrifices encyclopedic knowledge for capability. It lives always-on and by default on every computer as the kernel of LLM personal computing.

https://xcancel.com/karpathy/status/1938626382248149433#m

In which case, a raspberry Pi sounds like what you need.

It's not at all trivial to build a "small but highly capable" model. Sacrificing world knowledge is something that can be done, but only to an extent, and that isn't a silver bullet.

For an LLM, size is a virtue - the larger a model is, the more intelligent it is, all other things equal - and even aggressive distillation only gets you this far.

Maybe with significantly better post-training, a lot of distillation from a very large and very capable model, and extremely high quality synthetic data, you could fit GPT-5 Pro tier of reasoning and tool use, with severe cuts to world knowledge, into a 40B model. But not into a 4B one. And it would need some very specific training to know when to fall back to web search or knowledge databases, or delegate to a larger cloud-hosted model.

And if we had the kind of training mastery required to pull that off? I'm a bit afraid of what kind of AI we would be able to train as a frontier run.

Nobody said it's trivial.

I'm kind of shocked so many people are willing to ship their code up to companies that built their products on violating copyright.

It's a tough thing, I'm a solo dev supporting ~all at high quality. I cannot imagine using anything other than $X[1] at the leading edge. Why not have the very best?

Karpathy elides he is an individual. We expect to find a distribution of individuals, such that a nontrivial # of them are fine with 5-10% off the leading edge performance. Why? At least for free as in beer. At most, concerns about connectivity, IP rights, and so on.

[1] gpt-5 finally dethroned sonnet after 7 months

Today's qwen3 30b is about as good as last year's state of the art. For me that's more than good enough. Many tasks don't require the best of the best either.

So much this: people acting as if local model were useless when they were in awe about last year proprietary models that were not any better…

Mind linking to "his recent talk"? There's a lot of videos of him so it's a bit difficult to find what's most recent.