Everything runs on a π if you quantize it enough!
I'm curious about the applications though. Do people randomly buy 4xRPi5s that they can now dedicate to running LLMs?
Everything runs on a π if you quantize it enough!
I'm curious about the applications though. Do people randomly buy 4xRPi5s that they can now dedicate to running LLMs?
I'd love to hook my development tools into a fully-local LLM. The question is context window and cost. If the context window isn't big enough, it won't be helpful for me. I'm not gonna drop $500 on RPis unless I know it'll be worth the money. I could try getting my employer to pay for it, but I'll probably have a much easier time convincing them to pay for Claude or whatever.
It's sad that Pis are now so overpriced. They used to be fun little tinker boards that were semi-cheap.
The Raspberry Pi 2 Zero is as fast as a Pi 3, way smaller, and only costs $13 I think.
The high end Pis aren’t $25 though.
The Pi 4 is still fine for a lot of low end use cases and starts at $35. The Pi 5 is in a harder position. I think the CM5 and Pi 500 are better showcases for it than the base model.
Between the microcontrollers, Zero models, the Pi 4, and the Pi 5, they have quite a full-range from very inexpensive and low power to moderate price/performance SBCs.
One of the bigger problems with Pi 5, is that many of the classic Pi use cases don't benefit from more CPU than the Pi 4 had. PCIe is nice, but you might as well go CM5 if you want something like that. The 16GB model would be more interesting if it had the GPU/bandwidth to do AI/tokens at a decent rate, but it doesn't.
I still think using any other brand of SBC is an exercise in futility though. Raspberry Pi products have the community, support, ecosystem behind them that no other SBC can match.
> I'd love to hook my development tools into a fully-local LLM.
Karpathy said in his recent talk, on the topic of AI developer-assistants: don't bother with less capable models.
So ... using an rpi is probably not what you want.
I’m having a lot of fun using less capable versions of models on my local PC, integrated as a code assistant. There still is real value there, but especially room for improvements. I envision us all running specialized lightweight LLMs locally/on-device at some point.
I'd love to hear more about what you're running, and on what hardware. Also, what is your use case? Thanks!
So I am running Ollama on Windows using an 10700k and 3080ti. I'm using models like Qwen3-coder (4/8b) and 2.5-coder 15b, Llama 3 instruct, etc. These models are very fast on my machine (~25-100 tokens per second depending on model)
My use case is custom software that I build and host that leverages LLMs for example for domotica where I use my Apple watch shortcuts to issue commands. I also created a VS2022 extension called Bropilot to replace Copilot with my locally hosted LLMs. Currently looking at fine tuning these type of models for work where I work in finance as a senior dev
Thank you. I'll take a look at Bropilot when I get set up locally.
Have a great week.
> Karpathy said in his recent talk, on the topic of AI developer-assistants: don't bother with less capable models.
Interesting because he also said the future is small "cognitive core" models:
> a few billion param model that maximally sacrifices encyclopedic knowledge for capability. It lives always-on and by default on every computer as the kernel of LLM personal computing.
https://xcancel.com/karpathy/status/1938626382248149433#m
In which case, a raspberry Pi sounds like what you need.
It's not at all trivial to build a "small but highly capable" model. Sacrificing world knowledge is something that can be done, but only to an extent, and that isn't a silver bullet.
For an LLM, size is a virtue - the larger a model is, the more intelligent it is, all other things equal - and even aggressive distillation only gets you this far.
Maybe with significantly better post-training, a lot of distillation from a very large and very capable model, and extremely high quality synthetic data, you could fit GPT-5 Pro tier of reasoning and tool use, with severe cuts to world knowledge, into a 40B model. But not into a 4B one. And it would need some very specific training to know when to fall back to web search or knowledge databases, or delegate to a larger cloud-hosted model.
And if we had the kind of training mastery required to pull that off? I'm a bit afraid of what kind of AI we would be able to train as a frontier run.
Nobody said it's trivial.
I'm kind of shocked so many people are willing to ship their code up to companies that built their products on violating copyright.
It's a tough thing, I'm a solo dev supporting ~all at high quality. I cannot imagine using anything other than $X[1] at the leading edge. Why not have the very best?
Karpathy elides he is an individual. We expect to find a distribution of individuals, such that a nontrivial # of them are fine with 5-10% off the leading edge performance. Why? At least for free as in beer. At most, concerns about connectivity, IP rights, and so on.
[1] gpt-5 finally dethroned sonnet after 7 months
Today's qwen3 30b is about as good as last year's state of the art. For me that's more than good enough. Many tasks don't require the best of the best either.
So much this: people acting as if local model were useless when they were in awe about last year proprietary models that were not any better…
Mind linking to "his recent talk"? There's a lot of videos of him so it's a bit difficult to find what's most recent.
https://www.youtube.com/watch?v=LCEmiRjPEtQ
Ah that one. Thanks!
I think the problem is that getting multiple Raspberry Pi’s is never the cost effective way to run heavy loads.
$500 gives you about 6 RPi 5 8GB or 4 16GB, excluding accessories or other necessary equipment to get this working.
You'll be much better off spending that money on something else more useful.
> $500
Yeah, like a Mac Mini or something with better bandwidth.
Raspberry Pis going up in price make them very unattractive since there is a wealth of cheap second used better hardware out there such as NUCs with Celerons
Model intelligence should be part of your equation as well, unless you love loads and loads of hidden technical debt and context-eating, unnecessarily complex abstractions
GPT OSS 20B is smart enough but the context window is tiny with enough files. Wonder if you can make a dumber model with a massive context window thats a middleman to GPT.
Matches my experience.
Just have it open a new context window, the other thing I wanted to try is to make a LoRa but im not sure how that works properly, it suggested a whole other model but it wasnt a pleasant experience since it’s not as obvious as diffusion models for images.
How do you evaluate this except for anecdote and how do we know your experience isn't due to how you use them?
You can evaluate it as anecdote. How do I know you have the level of experience necessary to spot these kinds of problems as they arise? How do I know you're not just another AI booster with financial stake poisoning the discussion?
We could go back and forth on this all day.
you got very defensive. it was a useful question - they were asking in terms of using a local LLM, so at best they might be in the business of selling raspberry pis, not proprietary LLMs.
Yeah to me it more poisonous that people reflexively believe any pushback must be wrong because people feel empowered regardless of any measurement that may point out that people only get (maybe) out of LLM models what they put into them, and even then we can't be sure. That this situation exists and people have been primed with a complete triangulation of all the arguments just simply isn't healthy and we should demand independent measurements instead of the fumbling in the dark of the current model measurements... Or admit that measuring them isn't helpful and like a parent maybe alluded to, can only be described as anecdote and there is no discernable difference between many models.
Capability of the model itself is presumably the more important question than those other two, no?
MI50 is cheaper
This is some sort of joke right?
Sometimes you buy a pi for one project start on it buy another for a different project, before you know it none are complete and you have ten Raspberry Pis lying around across various generations. ;)
Arduino hobbist, same issue.
Though I must admit to first noticing the trend decades before discovering Arduino when I looked at the stack of 289, 302, and 351W intake manifolds on my shelf and realised that I need the width of the 351W manifold but the fuel injection of the 302. Some things just never change.
I have different model Raspberry Pi's and I'm having a hard time justifying buying a 5... but if I can run LLMs off one or two... I just might. I guess what the next Raspberry Pi needs is a genuinely impressive GPU that COULD run small AI models, so people will start cracking at it.
I have clusters of over a thousand raspberry pi’s that have generally 75% of their compute and 80% of their memory that is completely unused.
That’s an interesting setup. What are you doing with that sort of cluster?
99.9% of enthusiast/hobbyist clusters like this are exclusively used for blinkenlights
Blinkenlights are an admirable pursuit
That wasn't a judgement! I filled my homelab rack server with mechanical drives so I can get clicky noises along with the blinky lights
Good ol' Amdahl in action.
That sounds awesome, do you have any pictures?
Is it solar powered?
Depends on the model - if you have a sparse model with MoE, then you can divide it up into smaller nodes, your dense 30b models, I do not see them flying anytime soon.
Intel pro B50 in a dumpster PC would do you well better at this model (not enough ram for dense 30b alas) and get close to 20 tokens a second and so much cheaper.
For $500 you may as well spend an extra $100 and get a Mac mini with an m4 chip and 256gb of ram and avoid the headaches of coordinating 4 machines.
I don't think you can get 256 gigs of ram in a mac mini for $600. I do endorse the mac as an AI workbench tho
I think it serves a good test bed to test methods and models. We'll see if someday they can reduce it to 3... 2... 1 Pi5's that can match performance.
"quantize enough"
though at what quality?
Quantity has a quality all its own.
I mean at this point it's more of a "proof-of-work" with shared BP ; I would deff see some domotic hacker get this running - hell maybe i'll do this do if I have some spare time and want to make something like alexa with customized stuff - would still need text to speech and speech to text but that's not really the topic of his set-up ; even for pro use if that's really usable why not just spawn qwen on ARM if that's cheaper - there is a lot of way to read and leverage such bench