I'd love to hook my development tools into a fully-local LLM. The question is context window and cost. If the context window isn't big enough, it won't be helpful for me. I'm not gonna drop $500 on RPis unless I know it'll be worth the money. I could try getting my employer to pay for it, but I'll probably have a much easier time convincing them to pay for Claude or whatever.

It's sad that Pis are now so overpriced. They used to be fun little tinker boards that were semi-cheap.

The Raspberry Pi 2 Zero is as fast as a Pi 3, way smaller, and only costs $13 I think.

The high end Pis aren’t $25 though.

The Pi 4 is still fine for a lot of low end use cases and starts at $35. The Pi 5 is in a harder position. I think the CM5 and Pi 500 are better showcases for it than the base model.

Between the microcontrollers, Zero models, the Pi 4, and the Pi 5, they have quite a full-range from very inexpensive and low power to moderate price/performance SBCs.

One of the bigger problems with Pi 5, is that many of the classic Pi use cases don't benefit from more CPU than the Pi 4 had. PCIe is nice, but you might as well go CM5 if you want something like that. The 16GB model would be more interesting if it had the GPU/bandwidth to do AI/tokens at a decent rate, but it doesn't.

I still think using any other brand of SBC is an exercise in futility though. Raspberry Pi products have the community, support, ecosystem behind them that no other SBC can match.

> I'd love to hook my development tools into a fully-local LLM.

Karpathy said in his recent talk, on the topic of AI developer-assistants: don't bother with less capable models.

So ... using an rpi is probably not what you want.

I’m having a lot of fun using less capable versions of models on my local PC, integrated as a code assistant. There still is real value there, but especially room for improvements. I envision us all running specialized lightweight LLMs locally/on-device at some point.

I'd love to hear more about what you're running, and on what hardware. Also, what is your use case? Thanks!

So I am running Ollama on Windows using an 10700k and 3080ti. I'm using models like Qwen3-coder (4/8b) and 2.5-coder 15b, Llama 3 instruct, etc. These models are very fast on my machine (~25-100 tokens per second depending on model)

My use case is custom software that I build and host that leverages LLMs for example for domotica where I use my Apple watch shortcuts to issue commands. I also created a VS2022 extension called Bropilot to replace Copilot with my locally hosted LLMs. Currently looking at fine tuning these type of models for work where I work in finance as a senior dev

Thank you. I'll take a look at Bropilot when I get set up locally.

Have a great week.

> Karpathy said in his recent talk, on the topic of AI developer-assistants: don't bother with less capable models.

Interesting because he also said the future is small "cognitive core" models:

> a few billion param model that maximally sacrifices encyclopedic knowledge for capability. It lives always-on and by default on every computer as the kernel of LLM personal computing.

https://xcancel.com/karpathy/status/1938626382248149433#m

In which case, a raspberry Pi sounds like what you need.

It's not at all trivial to build a "small but highly capable" model. Sacrificing world knowledge is something that can be done, but only to an extent, and that isn't a silver bullet.

For an LLM, size is a virtue - the larger a model is, the more intelligent it is, all other things equal - and even aggressive distillation only gets you this far.

Maybe with significantly better post-training, a lot of distillation from a very large and very capable model, and extremely high quality synthetic data, you could fit GPT-5 Pro tier of reasoning and tool use, with severe cuts to world knowledge, into a 40B model. But not into a 4B one. And it would need some very specific training to know when to fall back to web search or knowledge databases, or delegate to a larger cloud-hosted model.

And if we had the kind of training mastery required to pull that off? I'm a bit afraid of what kind of AI we would be able to train as a frontier run.

Nobody said it's trivial.

I'm kind of shocked so many people are willing to ship their code up to companies that built their products on violating copyright.

It's a tough thing, I'm a solo dev supporting ~all at high quality. I cannot imagine using anything other than $X[1] at the leading edge. Why not have the very best?

Karpathy elides he is an individual. We expect to find a distribution of individuals, such that a nontrivial # of them are fine with 5-10% off the leading edge performance. Why? At least for free as in beer. At most, concerns about connectivity, IP rights, and so on.

[1] gpt-5 finally dethroned sonnet after 7 months

Today's qwen3 30b is about as good as last year's state of the art. For me that's more than good enough. Many tasks don't require the best of the best either.

So much this: people acting as if local model were useless when they were in awe about last year proprietary models that were not any better…

Mind linking to "his recent talk"? There's a lot of videos of him so it's a bit difficult to find what's most recent.

I think the problem is that getting multiple Raspberry Pi’s is never the cost effective way to run heavy loads.

$500 gives you about 6 RPi 5 8GB or 4 16GB, excluding accessories or other necessary equipment to get this working.

You'll be much better off spending that money on something else more useful.

> $500

Yeah, like a Mac Mini or something with better bandwidth.

Raspberry Pis going up in price make them very unattractive since there is a wealth of cheap second used better hardware out there such as NUCs with Celerons

Model intelligence should be part of your equation as well, unless you love loads and loads of hidden technical debt and context-eating, unnecessarily complex abstractions

GPT OSS 20B is smart enough but the context window is tiny with enough files. Wonder if you can make a dumber model with a massive context window thats a middleman to GPT.

Matches my experience.

Just have it open a new context window, the other thing I wanted to try is to make a LoRa but im not sure how that works properly, it suggested a whole other model but it wasnt a pleasant experience since it’s not as obvious as diffusion models for images.

How do you evaluate this except for anecdote and how do we know your experience isn't due to how you use them?

You can evaluate it as anecdote. How do I know you have the level of experience necessary to spot these kinds of problems as they arise? How do I know you're not just another AI booster with financial stake poisoning the discussion?

We could go back and forth on this all day.

you got very defensive. it was a useful question - they were asking in terms of using a local LLM, so at best they might be in the business of selling raspberry pis, not proprietary LLMs.

Yeah to me it more poisonous that people reflexively believe any pushback must be wrong because people feel empowered regardless of any measurement that may point out that people only get (maybe) out of LLM models what they put into them, and even then we can't be sure. That this situation exists and people have been primed with a complete triangulation of all the arguments just simply isn't healthy and we should demand independent measurements instead of the fumbling in the dark of the current model measurements... Or admit that measuring them isn't helpful and like a parent maybe alluded to, can only be described as anecdote and there is no discernable difference between many models.

Capability of the model itself is presumably the more important question than those other two, no?

MI50 is cheaper

This is some sort of joke right?