I bet there’s gonna be a banger of a Mac Studio announced in June.
Apple really stumbled into making the perfect hardware for home inference machines. Does any hardware company come close to Apple in terms of unified memory and single machines for high throughput inference workloads? Or even any DIY build?
When it comes to the previous “pro workloads,” like video rendering or software compilation, you’ve always been able to build a PC that outperforms any Apple machine at the same price point. But inference is unique because its performance scales with high memory throughput, and you can’t assemble that by wiring together off the shelf parts in a consumer form factor.
It’s simply not possible to DIY a homelab inference server better than the M3+ for inference workloads, at anywhere close to its price point.
They are perfectly positioned to capitalize on the next few years of model architecture developments. No wonder they haven’t bothered working on their own foundation models… they can let the rest of the industry do their work for them, and by the time their Gemini licensing deal expires, they’ll have their pick of the best models to embed with their hardware.
> But inference is unique because its performance scales with high memory throughput, and you can’t assemble that by wiring together off the shelf parts in a consumer form factor.
Nvidia outperforms Mac significantly on diffusion inference and many other forms. It’s not as simple as the current Mac chips are entirely better for this.
But where are you going to find an Nvidia GPU with 128+ GB of memory at an enthusiast-compatible price?
You don’t need it if you use llamacpp on Windows, or if you compile it on Linux with CUDA 13 and the correct kernel HMM support, and you’re only using MoE models (which, tbh, you should be doing anyways).
What MoE has to do with it? Aside from Flash-MoE that supports exactly one model and only on macOs - you still need to load entire model into memory. You also don't know what experts going to be activated, so it's not like you can predict which needs to be loaded.
That might even be true, but how large is the TAM for such machines?
Some Chinese sources sell modded Nvidia GPUs with extra VRAM. They're quite affordable in comparison to even a Mac Pro.
Any links to them? Never heard of this..
I've seen a guy who sells modded 2080 Ti with 22gb for $500
https://www.tomshardware.com/pc-components/gpus/chinese-work...
There's also unreleased Nvidia engineering samples of cards with doubled VRAM like this - https://www.reddit.com/r/nvidia/comments/1rczghu/update_unre...
It’s been going on for a while. Search YouTube or the web for 48gb 4090 (this is one of the most popular modded Nvidia cards), Nvidia of course never officially made a 4090 with this much memory.
There are some on sale via eBay right now. The memory controllers on some Nvidia gpus support well beyond the 16-24gb they shipped with as standard, and enterprising folks in China desolder the original memory chips and fit higher capacity ones.
Go at ebay and search for RTX 4090 48GBs. There's plenty of them with prices around $3.5k
And how much do you trust Chinese hardware?
Give that most of mine, and probably yours, and probably most of the world's computers are in fact made in China one way or another, some higher percentage than others, I'm guessing most of us trust our hardware enough to continue using it.
When there's no one left to trust, maybe you need to re-evaluate your criteria.
I wouldn't say that's true or even likely. It's completely possible to be in a pit of vipers where every single snake is venomous, and that is pretty much what we are seeing: With technological advances, there is a certain subset of people that will use them primarily to solidify their power and control over others. There is no utopian society right now whose government doesn't look to spy through technology, which of course is best set up at time of manufacture.
Agreed. Unless you have full control over the production chain to fully produce a device, you are subject to the whims and desires of those who preside over such technological feats that we take for granted in our daily lives.
To the original point, it's safe to say that highlighting a nationality with regards to trust is baseless and without merit, as would be for any other topic (men/women from x are y, z food is better here, etc..). Real life is much more complicated and nuanced past nationalities. Some might call it FUD (fear, uncertainty and doubt) but there's always a deeper rationale at the individual level as well.
Rather than people being wary of Chinese in general, it's more that there is a high degree of government control exercised in China and they are known to be very strategic with long-term planning in regards to technology control both for spying and actual remote control of devices. We are all just looking for the least bad option. It's not like devices from other countries are immune, but they are often less organized so there is a better chance of avoiding the Chinese level of planned access.
It does seem like pretty low risk in this specific case so I agree OP's comment was bit over the top, but I would have no way to make anything resembling even an educated guess as to how far their programs go.
The Mac is also chinese hardware
It would be hilarious if you are using a Lenovo device right now.
and let alone competing on the energy consumption!
The Nvidia DGX Spark is exactly this and in the same price and performance bracket.
Sadly, memory bandwidth is abysmal compared to Apple chips - 273 GB/s vs 614 GB/s on M5 Max for similar price. Even though fp4 compute is faster, it doesn't help for all the decode heavy agentic workflows.
You can still buy used 3090 cards on ebay. 5 of them will give you 120GB of memory and will blow away any mac in terms of performance on LLM workloads. They have gone up in price lately and are now about $1100 each, but at one point they were $700-800 each.
I don't see how 5x 3090's is a better option than an M3 Ultra Mac studio.
The mac will just work for models as large as 100B, can go higher with quantized models. And power draw will be 1/5th as much as the 3090 setup.
You can certainly daisy chain several 3090's together but it doesn't work seamlessly.
> You can certainly daisy chain several 3090's together
It's not "daisy chaining" 3090 has NVLink.
Really? How would you NVLink more than 2 3090's?
> The mac will just work for models as large as 100B, can go higher with quantized models. And power draw will be 1/5th as much as the 3090 setup.
This setup will work for 100B models as well. And yes, the Mac will draw less power, but the Nvidia machine will be many times faster. So depending on your specific Mac and your specific Nvidia setup, the performance per watt will be in the same ballpark. And higher absolute performance is certainly a nice perk.
> You can certainly daisy chain several 3090's together but it doesn't work seamlessly.
Citation needed; there's no "daisy chaining" in the setup I describe, and low level libraries like pytorch as well as higher level tools like Ollama all seamlessly support multiple GPUs.
I think it's bad form to say "citation needed" when your original claim didn't include citations.
Regardless - there's a difference between training and inference. And pytorch doesn't magically make 5 gpus behave like 1 gpu.
How much does it cost to have an electrician wire up 240v circuit just to power the thing?
The machine I’m describing works just fine on a dedicated 15A 120V circuit.
Where are you gonna find Apple hardware with 128GB of memory at enthusiast-compatible price?
The cheapest Apple desktop with 128GB of memory shows up as costing $3499 for me, which isn't very "enthusiast-compatible", it's about 3x the minimum salary in my country!
Apple is not catering to minimum salaries in poor countries. Does this really need to be explained?
$3499 is definitely enthusiast compatible. That's beefy gaming PC tier, which is possibly the canonical example of an enthusiast market.
This isn't tens of thousands of dollars for top tier Nvidia chips we're talking about.
Seems I misunderstood what a "enthusiast" is, I thought it was about someone "excited about something" but seems the typical definition includes them having a lot of money too, my bad.
I'm an immigrant to Canada, and yes, English has both literal meanings and colloquial meanings.
In the most literal meaning, absolutely, "Enthusiast" just means a person who likes something, is excited about something.
When it comes to market and products though, typically you'll see the word "Enthusiast" as mid-tier - something like: Consumer --> Enthusiast --> Professional (may have words like "Prosumer" in there as well etc:)
In that context, which is typically the one people will use when discussing product pricing and placement, "Enthusiast" is somebody who yes enjoys something, but does it sufficiently to be discerning and capable of purchasing mid-tier or above hardware.
So while a consumer photographer, may use their phone or compact or all-in-one camera, enthusiast photographer will probably spend $3000 - $5000 in camera gear. Equivalently, there are myriad gamers out there (on phones, consoles, Geforce Now, whatever:), an enthusiast gamer is assumed to have a dedicated gaming computer, probably a tower, with a dedicated video card, likely say a 5070ti or above, probably 32GB+ RAM, couple of SSDs which are not entry level, etc.
Again, this is not to say a person with limited budget is "not a real enthusiast", no gatekeeping is intended here; simply, if it may help, what the word means when it comes to market segmentation and product pricing :)
Additionally, "enthusiasts"/"hobbyists" tend to be willing to spend beyond practical utility, while professionals are more interested in pragmatism, especially in photography from what I can tell.
If you're an actual pro, you need your stuff to work properly, efficiently, reliably, when it's called for. When you're a hobbyist, it's sometimes almost the goal to waste money and time on stuff that really doesn't matter beyond your interest in it; working on the thing is the point, not the value it generates. Pros should spend money on good tools and research and knowledge, but it usually needs to be an investment, sometimes crossing over with hobbyist opinions.
A friend of mine who's a computer hobbyist and retail IT tech, making far far less than I do, spends comically more than me on hardware to play basically one game. He keeps up to date with the latest processors and all that stuff, he knows hardware in terms of gaming. I meanwhile—despite having more money available—have a fairly budget gaming PC that I did build myself, but contains entirely old/used components, some of which he just needed to get rid of and gave me for free, and I upgrade my main mac every 5 years or something. I only upgrade when hardware is really getting in my way.
>> So while a consumer photographer, may use their phone or compact or all-in-one camera, enthusiast photographer will probably spend $3000 - $5000 in camera gear.
It's interesting that you chose photographers as the example here. In many cases that I've seen, enthusiast photographers spend much more than professional photographers on their gear because the photographers make their money with their gear and therefore need to justify it, while the enthusiasts are often tech people, successful doctors, etc., who spend lots and lots on money on their hobbies...
In any case, your point stands, that "enthusiast" computer users would easily spend $3-4K or more on gear to play games, train models, etc.
$3.5k is a lot of money, but not a ton by American hobby standards. It's easy to spend multiples, even orders of magnitude more than that on hobbies like fishing, wine, sports tickets, concerts, scuba, travel, being a foodie, golf, marathons, collectibles, etc.
It's out of reach for lots of people, even in developed countries. But it's easily within reach for loads of people that care more about computing than other stuff.
In June 1977, the base Apple II model with 4 KB of RAM was $1,298 (equivalent to about $6,900 in 2025), and with the maximum 48 KB of RAM it was $2,638 (equivalent to about $14,000 in 2025).
(Source: Wikipedia via Claude Opus)
Wow, 48k for $14000. Now you can get a MBP with a million times more memory for $3500 or so. Whereas that CPU was clocked at 1 MHz, so CPUs are only several thousand times faster, maybe something like 30,000 times faster if you can make use of multi-core.
I'd argue that some of those are more consumption and activity than hobby depending on how they're engaged with, and that people use the word "hobby" too loosely, but would agree that Americans in-particular consume at obscene rates.
Golf equipment, mountaineering equipment, skiing and snowboarding lift tickets and gear, a single excessive graphics card that's only used for increasing frame rates marginally, or basically a single extra feature on a car, are all things that accumulate quite quickly. Some are clearly more superfluous than others and cater to whales, while some are just expensive by nature and aren't attempting to be anything else
Those are the prices for just buying equipment, which at least retain some kind of value. 3 million+ American kids are enrolled in competitive soccer with annual clubs dues between $1K and $5K, and that money is just gone at the end of the year. Basically none of those kids are going to have a career in soccer, so it's clearly a hobby, and everyone knows it. And soccer isn't even the most popular sport!
Ya, I guess that's another category entirely. The cost of enrolling a kid in anything, potential travel involved etc..
I live in America, I am very well compensated. Have been for 15 years now. $3500 is a lot of money. A lot. There is a tiny bubble of us tech folks who think it is accessible to most people. It is not. It is also the same reason Macs are still a niche. Don't take your circles to be the standard, it is very very far from it, especially if you think $3500 is not a lot of money.
It is easy to confirm this, just look at the sales number of these $3500 devices. It is definitely not an enthusiast price point, even in the US.
It's not nothing for most people... it's more than a month of rent/mortgage for a significant number of Americans even. But if it's your primary hobby, it's not completely out of reach, and it's not something you necessarily spend every year. A lot of people will upgrade to a new computer every 3-5 years and maybe upgrade something in between those complete system upgrades.
I know plenty of people who don't make a lot of money (say top 25% or so) that will have a Boat or RV that costs more than a $3500 computer, and balk at the thought of spending that much on a computer. It just depends on where your interests are.
The first words I said: "$3.5k is a lot of money..."
There are tens of millions of top 10% income adults in America. So something can be both unaffordable to most people, and also easily accessible to very many people.
It’s a midrange to upper expense in the US if it’s your hobby. Most people don’t have a serious computer hobby but they golf, trade ATVs, travel, drink, etc.
There are something like 24 million millionaires in the United States... Estimates are that Americans spent $157 billion on pets in 2025.
There are a lot of people who could easily choose to spend $3,500 on a computer.
$3500 would have been 3–4 months' discretionary spending as a PhD student in Finland 15 years ago. A sum you might choose to spend once a year on something you find genuinely interesting.
Some people succumb to lifestyle creep or choose it deliberately. Others choose to live below their means when their income grows. The latter have a lot more money to spend on extras, or to save if that's what they prefer.
An enthusiast in the hobby space is by definition someone willing to pour much more money that someone else not that enthusiast in whichever hobby we are talking about.
Well, and also has a bunch of money, not just willing. I guess locally we don't really have that difference, as two other commentators here went by, that's why I had to update my local understanding of "enthusiast". Usually we use it for how engaged/interested a person is, regardless of how much money they can or are willing to use.
Learned something new today at least, so that's cool :)
Yes, when tech gear is sold as 'enthusiast' gear, it is almost invariably the most expensive non-professional tier of equipment. That is roughly the common understanding: Expensive and focused on features more than security required for public use; while remaining within reach of at least some individuals, not only corporations.
In a hobby where there are (strong) HW requirements, it mostly takes for granted you have money to shell out for your hobby, indeed.
For an individual making median income in the US, it would cost 2% of your income to get a machine like this every 4-5 years. That's a matter of enthusiasm, not a matter of having a lot of money. Sorry that income is less where you are, but the people talking about the product tier are using American standards.
1200$ as the minimum salary covers probably 70% of Europe by population?
The Neo has enough power to do small LLM testing and pretty much anything else a bit slowly, and costs $600?
Neo tops at 8GB RAM. What LLM are you going to run there? Functiongemma?
It can absolutely do some ML inference on it, but not much in terms of LLMs.
Did you need to add poor? Unless apple isn't catering to the US
I spent aaround that on my current personal desktop... 9950X, 2x48gb ddr5/@6000, RX 9070XT, 4tb gen 5 nvme + 4tb gen 4 nvme. I could have cut the cpu to a 9800x3d and ram to 32gb with a different GPU if my needs/usage were different. I'm running in Linux and don't game too much.
That said, a higher end gaming setup is going to cost that much and is absolutely in the enthusiast realm. "enthusiast" doesn't mean compatible with "minimum wage"
The original Mac with 128KB of memory cost $2,495 when Apple released it in 1984. It would be about 3x that in today's money.
I came here to say the same. Even with my student discount price of $1000, that's over 3K in today's dollars.
We are so freaking spoiled by the cheap cost of compute now.
> it's about 3x the minimum salary in my country!
Enthusiast compute hardware doesn't cater to the people on the minimum salary in any country, let alone developing nations. When Ferrari makes a car they don't ask themselves if people on minimum salary will be able to afford them.
In in the bottom two poorest EU member states and Apple and Microsoft Xbox don't even bother to have a direct to customer store presence here, you buy them from third party retailers.
Why? Probably because their metrics show people here are too poor to afford their products en-masse to be worth operating a dedicated sales entity. Even though plenty of people do own top of the line Macbooks here, it's just the wealthy enthusiast niche, but it's still a niche for the volumes they (wish to)operate at. Why do you think Apple launched the Mac Neo?
Right, I think maybe we're then talking about "upper class enthusiasts" or something in reality then? I understood that to juts be about the person, not what economic class they were in, maybe I misunderstood.
Yes, it's a different definition.
Enthusiast in this contest more or less means you are excited enough about something to get a level above what normal people should get and just below professional pricing. An enthusiast camera body can be 2000 euros.
I would say an enthusiast computer is 2-4k.
It really depends what you meant with minimum salary (yearly?) because paying 3 months of salary for a computer like that isn't far fetched. You're not using this to generate recipes for cookies. An enthusiast level car is expensive as well.
enthusiasts in computer hardware assumes enthusiasm about hardware, not about "hardware on an budget". It doesn't matter if it's afforable or not.
>Right, I think maybe we're then talking about "upper class enthusiasts" or something in reality then?
Why? Enthusiasts are by definition people for whom value for money is not the main driver but top performance and cutting edge novelty at any cost. Affording enthusiast computer hardware is not a human right same how affording a Lamborghini or McMansion isn't.
But you don't need to buy a Lamborghini to do your grocery shopping or drive your kids to school, same how you don't need an Nvidia 5090 or MacBook Pro Max to do your taxes or do your school work.
So the definition is fine as it is. It's hardware for people with very deep pockets, often called whales.
tell me what pc with an nvidia gpu can you buy with same memory and performance.
I never liked apple hardware, but they are now untouchable since their shift to own sillicon for home hardware.
> tell me what pc with an nvidia gpu can you buy with same memory and performance.
And power consumption !
The performance per watt of Apple is unmatched.
This needs to be sold as the big ticket item for low level devs. Their chips are some of the most power efficient chips on the market right now.
Hoping they release a blade server version somehow.
Apple releasing anything enterprise or "server" related would be a pretty big pivot - let alone blades.
Nvidia's recent GPUs are more power-efficient than Apple Silicon in raster, training and inference workloads.
A blade server would get cancelled just like the Mac Pro for exactly the same reasons: https://9to5mac.com/2026/03/02/some-apple-ai-servers-are-rep...
> Nvidia's recent GPUs are more power-efficient than Apple Silicon in raster, training and inference workloads.
I think you can do better than the proverbial Apples and Oranges comparison.
In terms of total system, "box on desk", Apple is likely to remain the performance per watt leader compared to random PC workstations with whatever GPUs you put inside.
Then ignore me, and go ask your local datacenter why Apple Silicon isn't on any of their racks.
I've owned some beefy computers in the past and this tiny little m4 mini on my desk blows them all out of the water easily. It's crazy.
Untouchable my ass. You get a PC that has an ssd glued to the motherboard so if you run write intensive workloads and that thing wears out replacing it will have significant cost. Then there’s no PCie slot to get any decent network card if you want to work more than one of them in unison, you’re stuck with that stupid thunderbolt 5 while Infiniband gives x10 network speeds. As for memory bandwidth, it’s fast compared to CPUs but any enterprise GPU dwarfs it significantly. The unified RAM is the only interesting angle.
Apple could have taken a chunk of the enterprise market now with that AI craze if they had made an upgradable and expandable server edition based on their silicon. But no, everything has to be bolt down and restricted.
This has changed since Sam Altman started buying up all the chip supply, raising prices on memory, storage, and GPUs for everyone, but it used to be the case that you could build a PC that was both cheaper and faster than a Mac for LLM inference, with roughly equal performance per watt.
You would use multiple *90-series GPUs, throttled down in terms of power. Depending on the GPU, the sweet spot is between 225-350W, where for LLM workloads you only lose 5-10% of performance for a ~50% drop in power consumption.
Combined with a workstation (Xeon/Epyc) CPU with lots of PCIe, you can support 6-7 such GPUs (or more, depending on available power). This will blow away the fastest Mac studio, at a comparable performance per watt.
Again, a lot of this has changed, since GPUs and memory are so much more expensive now.
Macs are great for a simpler all in one box with high memory bandwidth and middling-to-decent GPU performance, but they are (or were) absolutely not "untouchable."
With 6-7 GPUs and EPYC cpu it will also cost 2-3x more than a Mac Studio.
I think OP’s point was that it would do more than 2-3x the workload, thus them stating “blow it out of the water” and specifying “performance-per-watt”.
But they're pretty fast and can have loads of RAM, which would be prohibitively expensive with Nvidia.
A 128GB 2TB Dell Pro Max with Nvidia GB10 is about $4200, a Mac Studio with 128GB RAM and 2TB storage is $4100. So pretty comparable. I think Dell's pricing has been rocked more by the RAM shortage too.
Unfortunately the GB10 is incredibly bandwith starved. You get 128gb ram, but only 270GB/s bandwidth. The M3 Ultra mac studio gets you 820GB/s. (The M4 max is at 410GB/s. I'm not aware of any workload that gets the GB10 to it's theoretical peakflops.
You can't get a 128GB M3 Ultra, it's also more expensive. For some workloads the Studio is better, for others the GB10.
~not unified memory tho~
It is unified memory on this one
From the spec sheets I’m looking at, it is not. I’m seeing models of the Dell Pro Max with 128 GB of DDR5-6400 as CAMM2, then a separate memory of up to 24 GB on the GPU. CAMM2 does not make the memory unified.
There are also SO-DIMM options.
You're not looking at the right thing. Dell's naming is horrible. Dell Pro Max with GB10 (https://www.dell.com/en-us/shop/cty/pdp/spd/dell-pro-max-fcm...). It's a very different computer than what you're looking at and has 128GB LPDDR5X unified memory.
Thanks for pointing that out. I found a more informative article about that model at https://www.mcpgov.com/dell-pro-max-gb10
my bad
I took ~ to be a "singing tone" for some reason till I saw sibling and realized it might be an attempted strikethrough xD
That won't hold much benefit as SOCAMM2 and LPCAMM2 get more popular.
> So pretty comparable.
The Mac Studio almost certainly uses at least half the power
(educated guess, I'm too lazy to go look at all the spec sheets and run the numbers)
It's actually reversed. The GB10 chipset has a TDP of 140w, whereas M2/M3 Ultra pulls over 250w from the wall: https://support.apple.com/en-us/102027
> It's actually reversed. The GB10 chipset has a TDP of 140w, whereas M2/M3 Ultra pulls over 250w from the wall
Come on mate ... I think you and I both know I was talking about complete system here, not discrete components.
I'm pretty sure your total package (Dell Pro Max + GB10) will pull more from the wall.
I'm pretty sure you need to look up what you're talking about instead of making a guess.
The Dell Pro Max PSU + enclosure is only rated for 240w, it literally can't pull more than 250w from the wall without shorting itself.
> 240w
280w according to the spec sheet I just looked at.
Also just look at the graphs on Geerling's website. The Mac Studio eats the Dell for breakfast in a number of the tests: https://www.jeffgeerling.com/blog/2025/dells-version-dgx-spa...
Not quite, what is the vRAM bandwidth of each? The bandwidth is a huge contributor to LLM performance.
AFAIK, for the unified bandwidth, it depends mostly on the CPU, for M4 Max (I think it's the default today?) it does ~550 GB/s, while GB10 does ~270 GB/s, so about a 2x difference between the two. For comparison, RTX Pro 6000 does 1.8 TB/s, pretty much the same as what a 5090 does, which is probably the fastest/best GPUs a prosumer reasonable could get.
Do NVIDIA solutions also outperform the Apple M-series in performance per Watt?
No, that's why Apple uses Performance Per Watt not actual performance celling as the metric. In actual workloads where you'd need this power then actual performance is what matters not PPW.
Probably comparable, but that's only with business-grade products, it's why Apple's current silicon is so remarkable on the market at the consumer level.
Thanks.
Nvidia isn't selling one-off home computers afaik. But yes in terms of datacenter cloud usage Nvidia performs.
GB300 DGX Station was announced last Monday.
It's going to cost far more than a diy machine with multiple lower end GPUs. Which is fine -- it's aimed at enterprise, not home labs.
https://marketplace.nvidia.com/en-us/enterprise/personal-ai-...
Amusingly there's a macbook next to it in the pic, is this headless?
It has a HDMI port and its USB-C ports also support display out. But I believe most who buy it intend to use it headless. The machine runs Ubuntu 24.04 and has a slightly customised Gnome (green accents and an nvidia logo in GDM) as its desktop.
Jeff Geerling doing that 1.5TB cluster using 4 Mac Studios was pretty much all the proof needed to demo how the Mac Pro is struggling to find any place any more.
https://www.jeffgeerling.com/blog/2025/15-tb-vram-on-mac-stu...
That is the proof what is left is a workaround, just like pilling minis on racks because Apple left the server space.
Also why Swift nowadays has to have good Linux support, if app developers want to share code with the server.
A workaround that works is better than an official solution that's barely adequate. Which is often the case.
Or just maybe, to use a Steve Jobs quote, one is holding it wrong and should look elsewhere.
People sneer at this Steve Jobs quote, but almost anybody working in tech had at some point quoted another, stronger, quote like "We tried to make the program idiot proof, but they keep making better idiots".
There's also: "Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far, the Universe is winning."
https://en.wikiquote.org/wiki/Rick_Cook
But those Thunderbolt links are slower than modern PCIe. If there's actually a M5-based Mac Studio with the same Thunderbolt support, you'll be better off e.g. for LLM inference, streaming read-only model weights from storage as we've seen with recent experiments than pushing the same amount of data via Thunderbolt. It's only if you want to go beyond local memory constraints (e.g. larger contexts) that the Thunderbolt link becomes useful.
Why everyone wants to live in dongle/external cabling/dock hell is beyond me. PCIe cards are powered internally with no extra cables. They are secure. They do not move or fall off of shit. They do not require cable management or external power supplies. They do not have to talk to the CPU through a stupid USB hub or a Thunderbolt dock. Crappy USB HDMI capture on my Mac led me to running a fucking PC with slots to capture video off of a 50 foot HDMI cable, that then streamed the feed to my Mac from NDI, because it was more reliable than the elgarbo capture dongle I was using. This shit is bad. It sucks. It's twice the price and half the quality of a Blackmagic Design capture card. But, no slots, so I guess I can go get fucked.
For anything that's even somewhat in the consumer space rather than pure workstation/professional, the main reason is that dongles can be used with a laptop but add-in cards can't. When ordinary consumer PCs (or even office PCs) are in the picture, laptops are a huge chunk of the target audience.
The market segments that can afford to ignore laptops and only target permanently-installed desktops are mostly those niches where the desktop is installed alongside some other piece of equipment that is much more expensive.
Wasn't streaming models from storage into limited memory a case where it was impressive that you could make the elephant dance at all?
If you want to get usable speeds from very large models that haven't been quantitized to death on local machines, RDMA over Thunderbolt enables that use case.
Consumer PC GPUs don't have enough RAM, enterprise GPUs that can handle the load very well are obscenely expensive, Strix Halo tops out at 128 Gigs of RAM and is limited on Thunderbolt ports.
The bad performance you saw was with very limited memory and very large models, so streaming weights from storage was a huge bottleneck. If you gradually increase RAM, more and more of the weights are cached and the speed improves quite a bit, at least until you're running huge contexts and most of the RAM ends up being devoted to that. Is the overall speed "usable"? That's highly subjective, but with local inference it's convenient to run 24x7 and rely on non-interactive use. Of course scaling out via RDMA on Thunderbolt is still there as an option, it's just not the first approach you'd try.
> If you gradually increase RAM, more and more of the weights are cached and the speed improves quite a bit
It'll increase a lot based on the zero-ram baseline. But it's still complete garbage compared to fitting the model in RAM. Even if you fit most of it in RAM you're still probably an order of magnitude slower than fitting all of it in RAM, most of your time spent waiting for your SSD.
If you don't care about performance, you have a lot of options.
The proposition of a Mac Pro in the Apple Silicon world wasn't necessarily about performance, it was about the existence of the PCIe slots. I don't think AI becoming a workload for pro Macs means the Mac Pro doesn't have a place, people who were using Mac Pros for audio or video capture didn't stop doing that media work and switched to AI as a profession. That market just wasn't big enough to sustain the Mac Pro in the first place and Apple has finally acknowledged that fact
I had a U-Audio PCI card in a Mac Pro during the Intel era of Macs. It was a chip to run their software plugins and the plugins are top of the line. I have a U-Audio box that runs over Thunderbolt now. I know there are people who need device slots, but it's vanishingly few. I'm disappointed that this category of machine is going away, but it stopped being for me in the Apple Silicon era.
so many peripherals now come in external boxes that communicate _incredibly quickly_ over Thunderbolt 4/5 that the need for PCIe is marginal, while the cost to support it is significant.
Wow spend 40k to get the same tokens/second in QWEN as you would on a 3090
I have a feeling that Mac fans obsess more about being able to run large models at unusably slow speeds instead of actually using said models for anything.
> Apple really stumbled into making the perfect hardware for home inference machines
For LLMs. For inference with other kinds of models where the amount of compute needed relative to the amount of data transfer needed is higher, Apple is less ideal and systems worh lower memory bandwidth but more FLOPS shine. And if things like Google’s TurboQuant work out for efficient kv-cache quantization, Apple could lose a lot of that edge for LLM inference, too, since that would reduce the amount of data shuffling relative to compute for LLM inference.
Or just mean that you could run a 5x bigger model on Apple than before.
Well, since its kv-cache that TurboQuant optimizes, it means five times bigger context fits into RAM, all other things being equal, not a five times bigger model. But, sure, with any given context size and the same RAM available, you can instead fit a bigger model—which also takes more compute to get the same performance.
Anything that increases the necessary compute to fully utilize RAM bandwidth in optimal LLM serving weakens Apples advantage for that.
DGX workstations, expensive but allow PCI cards as well.
https://marketplace.nvidia.com/en-us/enterprise/personal-ai-...
It's hilarious that not a single one of these has pricing listed anywhere public.
I don't think they expect anyone to actually buy these.
Most companies looking to buy these for developers would ideally have multiple people share one machine and that sort of an arrangement works much more naturally with a managed cloud machine instead of the tower format presented here.
Confirming my hypothesis, this category of devices more or less absent in the used market. The only DGX workstation on ebay has a GPU from 2017, several generations ago.
Nvidia doesn’t list prices because they don’t sell the machines themselves. If you click through each of those links, the prices are listed on the distributor’s website. For example the Dell Pro Max with GB10 is $4,194.34 and you can even click “Add to Cart.”
I don't mean the small GB10s.
If you try to find the pricing of the GB300 towers even on the manufacturer sites, you'll see that it's not listed for any of the six or so models.
Because that's a different price point, that's getting near 100K, and the availability is very limited. I don't think they're even selling it openly, just to a bunch of partners...
The MSI workstation is the one that is showing some pricing around. Seems like some distributors are quoting USD96K, and have a wait time of 4 to 6 weeks [0]. Other say 90K and also out of stock [1]
--
Isnt that because nobody has released one yet? They are brand new
I don't think it's so odd, very few products above ~$50k have final prices listed for anyone to buy 1-click.
Workstations above 50k are not that uncommon.
Older xeon based workstations easily reach that number.
If you put a 50 or 80K workstation in the HP store, it will say:
"Purchasing limit reached. To complete your order and provide you with the best customer experience, please call 1-877-888-8235"
'Important' people in organizations get them. They either ask for them, or the team that manages the shared GPU resources gets tired of their shit and they just give them one.
Yes, I agree this is the use case.
Since the user here is not paying for it directly, the manufacturer does not have any incentive to list prices anywhere.
There were plenty of them around when I worked at Nvidia. They definitely exist.
You have seen plenty of third party GB300 DGX workstations?
How much do those workstations cost? All of the different manufacturers links on that page lack pricing info and you have to contact them for pricing.
Cheapest i know if is around $96k
$4000
$4k is for GB10 (DGX Spark reference design). $90-100k is for GB300 (DGX Station reference design).
> ...making the perfect hardware for home inference machines.
I really don't get why anybody would want that. What's the use case there?
If someone doesn't care about privacy, they can use for-profit services because they are basically losing money, trying to corner the market.
If they care about privacy, they can rent cloud instances in order to setup, run, close and it will be both cheaper, faster (if they can afford it) but also with no upfront cost per project. This can be done with a lot of scaffolding, e.g. Mistral, HuggingFace, or not, e.g. AWS/Azure/GoogleCloud, etc. The point being that you do NOT purchase the GPU or even dedicated hardware, e.g. Google TPU, but rather rent for what you actually need and when the next gen is up, you're not stuck with "old" gen.
So... what use case if left, somebody who is both technical, very privacy conscious AND want to do so offline despite have 5G or satellite connectivity pretty much anywhere?
I honestly don't get who that's for (and I did try a dozens of local models, so I'm actually curious).
PS: FWIW https://pricepertoken.com might help but not sure it shows the infrastructure each rely on to compare. If you have a better link please share back.
> If they care about privacy, they can rent cloud instances in order to setup, run, close and it will be both cheaper, faster (if they can afford it) but also with no upfront cost per project. This can be done with a lot of scaffolding, e.g. Mistral, HuggingFace, or not, e.g. AWS/Azure/GoogleCloud, etc.
I'm a somewhat tech heavy guy (compiles my own kernel, uses online hosting, etc).
Reading your comment doesn't sound appealing at all. I do almost no cloud stuff. I don't know which provider to choose. I have to compare costs. How can I trust they won't peek at my data (no, a Privacy Policy is not enough - I'd need encryption with only me having the key). What do I do if they suddenly jack up the rates or go out of business? I suddenly need a backup strategy as well. And repeat the whole painful loop.
I'll lose a lot more time figuring this out than with a Mac Studio. I'll probably lose money too. I'll rent from one provider, get stuck, and having a busy life, sit on it a month or two before I find a fix (paying money for nothing). At least if I use the Mac Studio as my primary machine, I don't have to worry about money going to waste because I'm actually utilizing it.
And chances are, a lot of the data I'll use it with (e.g. mail) is sitting on the same machine anyway. Getting something on the cloud to work with it is yet-another-pain.
To your second issue/question, all the cloud provide CMEK services/features (for many years now).
> suddenly jack up the rates or go out of business?
There is basically no lock-in, you don't even "move" your image, your data is basically some "context" or a history of prompts which probably fits in a floppy disk (not even being sarcastic) so if you know the basic about containerization (Docker, podman, etc) which most likely the cloud provider even takes care of, then it takes literally minutes to switch from one to another. It's really not more complex that setting up a PHP server, the only difference is the hardware you run on and that's basically a dropdown button on a Web interface (if you don't want to have scripts for that too) then selecting the right image (basically NVIDIA support).
Consequently even if that were to happen (which I have NEVER seen! at worst it's like 15% increase after years) then it would actually not matter to you. It's also very unlikely to happen based of the investment poured into the "industry". Basically everybody is trying to get "you" as a customer to rely on their stack.
... but OK, let's imagine that's not appealing to you, have you not done the comparison of what a Mac Studio (or whatever hardware) could actually buy otherwise?
Ok. I think I misunderstood. So the idea is to simple set up the LLM service on the server and access it with an API like I would with any LLM provider? This way whatever application I want to use it for stays at home?
That's a bit more appealing. How much would it cost per month to have it continually online?
Well it depends entirely on what you need. You can even do the training yourself on that infrastructure to rent if you want. The more you do yourself, the more private but also the more expensive it will be.
I don't want to make an ad here but I'm going to point to HuggingFace https://endpoints.huggingface.co (and to avoid singling them out just https://replicate.com/pricing too but I don't know them well) as an example with pricing.
The "beauty" IMHO of such solutions is that again you pay for what you want. If you want to use the endpoint only for 5min to test that the model and its API fits your need? OK. You want the whole month? Sure. You want 1 user, namely you? Fine, not a lot of power, you want your whole organization to use that endpoint? Scale up.
I'm going to give very rough approximation because honestly I'm not really into this so someone please adjust with source :
Apple Mac Studio M3 Ultra 96GB = $4K
~NVIDIA A100 with 80G ~ 10x perf compared to M3 Pro (obviously depends on models)
So on Replicate today a one can get an A100 for ~$5/hr which is ... about a month. But that's for 10x speed and electricity included. So very VERY approximately if you use a Mac Studio for 10 months on AI non stop (days and night) then it's arguably worth it.
If you use it less, say 2hrs/day only for inference, then I imagine it takes few years to have the equivalent and by that time I bet Replicate or HuggingFace is going to rent much faster setup for much cheaper simply because that's what they have ALL done for the last few years.
Well, full disclosure (despite my comments above): I'm not interested in buying a Mac Studio. I was merely explaining why I thought people may prefer it.
For my own use, I'm just looking at absolute price (and convenience).
I haven't explored open weights models, so I have no idea which I'd want. It would be great to get a "frontier" model like Minimax-M2.5, but at $10/hr, it's not worth it - let alone $40/hr for GLM-5. I'd have to explore use cases for cheaper models. Likely for things related to reading emails, I can get by with a much cheaper model.
If I set one of these up, how easily is it for me to launch one of these (on the command line on my home PC) and then shut it down. Right now, when I write any app (or use OpenCode), it's frictionless. My worry is that either turning it on will be a hassle, and even worse, I'll forget to turn it off and suddenly get a big pointless bill.
If there are any guides out there on how people manage all this, it would be much appreciated.
Honestly I doubt it's worth it, hence my suggestion to make a "cold" estimation of both options.
Well it's not exactly a guide and honestly it's quite outdated (because I stop keeping track as I just don't get the quality of results I hope for versus huge trade offs that aren't worth it for me) but I listed plenty of models and software solutions for self-hosting, at home or in the cloud at https://fabien.benetou.fr/Content/SelfHostingArtificialIntel...
Feels free to check it out and if there is something I can clarify, happy to try.
I think the main use case is home automation. You don't want details of your home setup leaking out.
Genuine question: If I were to fine-tune a model with 10 years of business data in a competitive space, would you feel safe with cloud training?
If you already have those 10 years of bussiness data on Microsoft or Google services or their respective clouds, are you feeling safe?
I'm not a lawyer but technically most if not all cloud providers, specific to AI ("neo-cloud") or not, to provide Customer-managed encryption keys (CMEK) as someone else pointed out.
That being said if I were to be in such a situation, and if somehow the guarantees wouldn't be enough then I'd definitely expect to have the budget to build my own data center with GB300 or TPUs. I can't imagine that running it on a Mac Studio.
People store that data in databases in the same data centre so it's really the same level of trust needed that your provider adheres to the no training on your data. Trust and lawyers.
I'm not a big fan of reducing computing as a whole to just inference. Apple has done quite a bit besides that and it deserves credit. Mac Pro disappearing from the product line is a testament to it, that their compact solutions can cover all needs, not just local inference, to a degree that an expandable tower is not required at all.
Their compact solution doesn't cover all needs, they just decided that they didn't care about some of those needs. The Intel Mac Pro was the last Apple offering with high end GPU capabilities. That's now a market segment they just aren't supporting at all. They didn't figure out how to do it compactly, they just abandoned it wholesale.
Similarly if your use case depends on a whole lot of fast storage (eg, the 4x NVME to PCI-E x16 bifurcation boards), well that's also now something Apple just doesn't support. They didn't figure out something else. They didn't do super innovative engineering for it. They just walked away from those markets completely, which they're allowed to do of course. It's just not exactly inspiring or "deserves credit" worthy.
You could argue they abandoned that market long before (around the era of the mac pro trashcan). Along with the pro software.
They can abandon it multiple times ;)
When they introduced the cheese grater Mac Pro the new high end GPUs were a showcase feature of it. Complete with the bespoke "Duo" variants and the special power connector doohickey (MPX iirc?). So I'd consider that an attempt to re-enter that market at least.
> Mac Pro disappearing from the product line is a testament to it
Apple removing/adding something to their product line matters nothing, for all we know, they have a new version ready to be launched next month, or whatever. Unless you work at Apple and/or have any internal knowledge, this is all just guessing, not a "testament" to anything.
Did you read the article?
“Apple has also confirmed to 9to5Mac that it has no plans to offer future Mac Pro hardware.”
I did indeed! Did you read the article? Did you like it? Have you also read the HN guidelines by any chance?
None the less, what Apple says or doesn't say doesn't really matter. If their plan for a new Mac Pro is secret, they'll answer exactly that when someone asks them about it. Doesn't mean we won't see new Mac Pro hardware this summer. Plenty of cases in the past where they play coy and then suddenly, "whoops, we just had to keep it a secret, never mind".
CUDA 13 on Linux solves the unified memory problem via HMM and llamacpp. It’s an absolute pain to get running without disabling Secure Boot, but that should be remedied literally next month with the release of Ubuntu 26.04 LTS. Canonical is incorporating signed versions of both the new Nvidia open driver and CUDA into its own repo system, so look out for that. Signed Nvidia modules do already exist right now for RHEL and AlmaLinux, but those aren’t exactly the best desktop OSes.
But yeah, right now Apple actually has price <-> performance captured a lot of you’re buying a new computer just in general.
To me there is a fundamental difference. Even if PC hardware costs slightly more (now because of the RAM situation, Apple producing his chips in house can get better deals of course), it's something that is worth more investing in in.
Maybe you spend 1000$ more for a PC of comparable performance, well tomorrow you need more power, change or add another GPU, add more RAM, add another SSD. A workstation you can keep upgrade it for years, adding a small cost for an upgrade in performance.
An Apple machine is basically throw away: no component inside can be upgraded, you need more RAM? Throw it away and buy a new one. You want a new GPU technology? You have to change the whole thing. And if something inside breaks? You of course throw away the whole computer since everything is soldered on the mainboard.
There is then the software issue, with Apple devices you are forced to use macOS that kind of sucks, especially for a server usage. True nowadays you can install Linux on it, but the GPU it's not that well supported, thus you loose all the benefits. You have to stuck with an OS that sucks, while in the PC market you have plenty of OS choices, Windows, a million of Linux distributions, etc. If I need a workstation to train LLM why do I care about a OS with a GUI? It's only a waste of resources, I just need a thing that runs Linux and I can SSH into it. Also I don't get the benefit of using containers, Docker, etc.
Mac suck even hardware side form a server point of view, for example it's not possible to rack mount them, it's not possible to have redundant PSU, key don't offer remote KVM capability, etc.
"Upgrades" havent been a thing for nearly a decade. By the time you want to upgrade a machine part (c. 5yr+ for modern machines), you'd want to upgrade every thing, and its cheap to do so.
It isnt 2005 any more where RAM/CPU/etc. progress benefits from upgrading every 6mo. It's closer to 6yr to really notice
> By the time you want to upgrade a machine part (c. 5yr+ for modern machines), you'd want to upgrade every thing,
That's only the case for CPU/MB/RAM, because the interfaces are tightly coupled (you want to upgrade your CPU, but the new one uses an AM5 socket so you need to upgrade the motherboard, which only works with DDR5 so you need to upgrade your RAM). For other parts, a "Ship of Theseus" approach is often worth it: you don't need to replace your 2TB NVMe M.2 storage just because you wanted a faster CPU, you can keep the same GPU since it's all PCIe, and the SATA DVD drive you've carried over since the early 2000s still works the same.
Even this is understating it; if you buy at the right point in the cycle, you can Ship-of-Theseus quite a while. An AM4 motherboard released in Feb 2017 with a Ryzen 1600X CPU, DDR4 memory and a GTX780 Ti would be a obsolete system by today's standards. Yet, that AM4 motherboard can be upgraded to run a Ryzen 5800X3D CPU, the same (or faster) DDR4 memory, and a RTX 5070Ti GPU and be very competitive with mid-tier 2026 systems containing all new components. Throughout all this, the case, PSU, cooling solution, storage could all be maintained, and only replaced when individual components fail.
I expect many users would be happy with the above final state through 2030, when the AM6 socket releases. That would be 13 years of service for that original motherboard, memory, case and ancillary components. This is an extreme case, you have to time the initial purchase perfectly, but it is possible.
You can keep CPU and RAM for way longer than the GPU if you game...
Your point kind of disproves your point.
https://store.steampowered.com/hwsurvey/videocard/
That's news to me. I see Mac Minis with external drives plugged-in constantly; I bet those people would appreciate user-servicable storage. I doubt they bought an external drive because they wanted to throw away the whole computer.
Mac minis have user serviceable storage: https://store.m4-ssd.com/products/third-party-ssd-for-mac-mi...
External drive is workaround to apple’s pricing scheme, often purchased at same time as computer.
you need more RAM? Throw it away and buy a new one.
Or sell it, which is much easier to do with Macs because they're known quantities and not "Acer Onyx X321 Q-series Ultra".
There is then the software issue, with Apple devices you are forced to use macOS that kind of sucks, especially for a server usage
That's a fair point. Apple would get a ton of goodwill if they released enough documentation to let Asahi keep up with new hardware. I can't imagine it would harm their ecosystem; the people who would actually run Linux are either not using Macs at all, or users like me who treat them as Unix workstations and ignore their lock-in attempts.
I think most of that is really opinion and experiences. No doubt it’s not designed or built truly for racks but folks have been making rack mounts for Mac minis since they first came out.
On the upgrade path I don’t think upgrades are truly a thing these days. Aside from storage for most components by the time you get to whatever your next cycle is, it’s usually best/easiest to refresh the whole system unless you underbought the first time around.
>>Mac suck even hardware side form a server point of view, for example it's not possible to rack mount them, it's not possible to have redundant PSU, key don't offer remote KVM capability, etc.
https://atp.fm/683
As others have said, that's just not the reality of a modern work machine. If I need a new GPU or more RAM, I'm positive I need everything else upgraded too
> with Apple devices you are forced to use macOS that kind of sucks, especially for a server usage
you can just install linux?
Only really possible with the M1. If referring to Asahi.
> You have to stuck with an OS that sucks, while in the PC market you have plenty of OS choices, Windows, a million of Linux distributions
Windows is 10x more enshittified than OSX
> An Apple machine is basically throw away: no component inside can be upgraded, you need more RAM? Throw it away and buy a new one.
Tell that to all the people rocking 5-10 year old macbook that still run great
Agreed. I’m planning on selling my 512GB M3 Ultra Studio in the next week or so (I just wrenched my back so I’m on bed-rest for the next few days) with an eye to funding the M5 Ultra Studio when it’s announced at WWDC.
I can live without the RAM for a couple of months to get a good price for it, especially since Apple don’t sell that model (with the RAM) any more.
Just out of curiosity, where do you think is the best place to sell a machine like that with the lowest risk of being scammed, while still getting the best possible price?
Wish you a speedy recovery for your back!
> Just out of curiosity, where do you think is the best place to sell a machine like that with the lowest risk of being scammed, while still getting the best possible price?
There are none currently on eBay.co.uk, so I'm going to try there. I'll also try some of the reddit UK-specific groups.
As far as not being scammed - it's a really high value one-off sale, so it'll either be local pickup (and cash / bank-transfer at the time, which happens in seconds in the UK) or escrow.com (for non-eBay) with the buyer paying all the fees etc.
I'd prefer local pickup because then I have the money, the buyer can see it working, verify everything to their satisfaction etc. etc.
> Wish you a speedy recovery for your back!
Thank you :) It is a little better today. Sitting down is now tolerable for short periods... :)
doesn't escrow.com charge a 50$/pound minimum fees.
I do know that Escrow.com is one of the most reputable escrow platforms, on a more personal note, I would love to know a escrow service where I can just sell the spare domains I have (I have got some .com/.net domains for 1$ back during a deal for a provider), is there any particular escrow service which might not charge a lot and I can get a few dollars from selling them as some of those domains aren't being used by me.
> Thank you :) It is a little better today. Sitting down is now tolerable for short periods... :)
I am wishing you speedy recovery as well. A cowboy gotta have a strong back :-)
According to the calculator, it’d be about £280 assuming the purchase cost was £11k. I think that’s probably an upper-bound on the sale-price, though I can see bids of $20k on eBay.com for the same model.
I sold a domain via escrow.com a long time ago now (20 years or so) but the buyer paid fees, so I don’t know what they charge for that. You could try the calculator they have though (https://www.escrow.com/fee-calculator)
And thanks for the good wishes :)
Probably ebay
lowest is probably an apple trade in if available, but i can't imagine how bad of a price hit it will be.
I checked, it's terrible. They don't take into account the size of the RAM in the machine, so you get the base-model trade-in value (£1280). Yeah, no.
sounds like 100% risk of getting scammed
Hey didn't they drop the 512 Gb model?
https://appleinsider.com/articles/26/03/06/forget-512gb-ram-...
You may want to hold on to your M3 Ultra! There's no guarantee there will be a M5 Ultra with 512 Gb ram.
I don’t actually use the memory anywhere near as much as I thought I would. 256GB would be fine for me :)
Heh, my main "heavy stuff" desktop only has 64GB.
But it feels really good to have more ram than you can think of a use for.
I have a faint memory of an interview ages ago with Knuth I think where he mentioned as an aside he was using a workstation with 3.2 Gb of storage and 4 Gb of ram :)
Around the year 2001 I recall watching 3d studio Max R3 tutorials in which the teacher had an electric purple desktop which possessed an entire 4 gigs of ram. It blew my mind. My computer had 128mb and an ATI Rage 128 Pro.
I was young and dumb and never would have guessed I'd own a computer with 32gb of RAM that felt pitifully underpowered for today's tasks.
Humm purple and 4 gigs of ram in 2001 sounds like SGI. But those purple SGIs ran Irix so no 3d studio.
You're right! Crazy, that brings me back. I wonder why he showed it off. I wish I could find it. He probably wasn't using it for the tutorial at all, just nerding out and talking about how beefy computers handle rendering and complex geometry better.
I was constantly constrained by my computers back then. Trying to navigate complex scenes or model very detailed meshes could get soooo slow. But man I loved it so much.
> I wonder why he showed it off.
Probably because it ran Maya. Which was a SGI product back then, not an Autodesk product yet.
As to better or cheaper homelab: depends on the build. AMD AI Max builds do exist, and they also use unified memory. I could argue the competition was, for a long time, selling much more affordable RAM, so you could get a better build outside Apple Silicon.
The typical inference workloads have moved quite a bit in the last six months or so.
Your point would have been largely correct in the first half of 2025.
Now, you're going to have a much better experience with a couple of Nvidia GPUs.
This is because of two reasons - the reasoning models require a pretty high number of tokens per second to do anything useful. And we are seeing small quantized and distilled reasoning models working almost as well as the ones needing terabytes of memory.
The interesting question is whether they'll lean into it intentionally (better tooling, more ML-focused APIs) or just keep treating it as a side effect of their silicon design
I think we’ll see a much more robust ecosystem develop around MLX now that agentic coding has reduced the barrier of porting and maintaining libraries to it.
Apple abandoned the pro market long before ever releasing the current iteration of Mac Pro. I doubt they care about getting it back considering its a smaller niche of consumers and probably significantly more investment on the software side.
At best we probably get a chassis to awkwardly daisy chain a bunch of Mac Studios together
For LLMs and other pure memory-bound workloads, but for eg. diffusion models their FPU SIMD performance is lacking.
The new M chips beat basically any PC on video editing. Their new ProRes accelerator chiplet is so good they can’t even compete.
Goodluck storing all those 8k videos, plates, and other content on soldered in SSD
What part of your workflow relies on home LLM inference?
Just a reminder that the old Intel Mac Pro could handle 1.5TB of RAM ... today's Mac Studio can only handle 0.25TB.
Seem odd that a computer from a decade ago could have more than a 1TB of incremental RAM vs what we can buy today from Apple.
The M5 Ultra Studio may support more as it becomes a replacement for the Mac Pro.
> home inference machines.
The market for this use case is tiny
For now. In a few years it will be part of every day life, because people will see Apple users enjoying it without thinking about it. You won’t consider it a “home inference machine,” just a laptop with more capabilities than any other vendor offers without a cloud subscription.
The average person self hosts literally nothing, why would it be different for inference? Which benefits severely from economies of scale and efficient 24/7 utlization
I do love the Mac Studio. I had a 2019 Mac Pro, the Intel cheesegrater, but my home office upstairs became unpleasant with it pushing out 300W+. I replaced it with the M2 Ultra Studio for a fraction of the heat output (though I did had to buy an OWC 4xNVMe bay).
> I bet there’s gonna be a banger of a Mac Studio announced in June. Apple really stumbled into making the perfect hardware for home inference machines.
This I'm not actually as sure about. The current Studio offerings have done away with the 512GB memory option. I understand the RAM situation, but they didn't change pricing they just discontinued it. So I'm curious to see what the next Studio is like. I'd almost love to see a Studio with even one PCI slot, make it a bit taller, have a slide out cover...
how about the newly announced GB300 DGX Workstation?
Comparing a $100K workstation to a $4K desktop PC seems a bit Apples and oranges?
Framework offers the AI Ryzen Max with ̶1̶9̶6̶G̶B̶ 128GB of unified RAM for 2,699$
That's a pretty good deal I would think
https://frame.work/de/de/products/desktop-diy-amd-aimax300/c...
The framework desktop is quite cool, but those Ryzen Max CPUs are still a pretty poor competitor to Apple's chips if what you care about it running an LLM. Ryzen Max tops out at 256 GB/s of memory bandwidth, whereas an M4 Max can hit 560 GB/s of bandwidth.
So even if the model fits in the memory buffer on the Ryzen Max, you're still going to hit something like half the tokens/second just because the GPU will be sitting around waiting for data.
Personally, I'd rather have the Framework machine, but if running local LLMs is your main goal, the offerings from Apple are very compelling, even when you adjust for the higher price on the Apple machine.
There's also the DGX Spark. Granted, its price has been going up recently alongside everything else that has memory in it.
I haven't heard a single good think about DGX Spark from anyone using it, so I'd be pretty wary about that.
That also has pretty poor memory bandwidth. 283GB/s I think.
Yeah. The main selling point I'd say is the onboard ConnectX-7 hardware.
128gb is the max RAM that the current Strix Halo supports with ~250GB/s of bandwidth. The Mac Studio is 256GB max and ~900GB/s of memory bandwidth. They are in different categories of performance, even price-per-dollar is worse. (~$2700 for Framework Desktop vs $7500 for Mac Studio M3 Ultra)
128GB*
Thanks for spotting the mistake. No Idea how I got to 192
For what it's worth, I really wish that was the actual number.
Still, running 2 to 4 5090 will beat anything Apple has to offer for both inference and training.
I would say 1-2 RTX 6000 Pro maxQ are more practical.
That won’t work for the home hobbyist 2.4KW of GPU alone plus a 350W threadripper pro with enough PCIe lanes to feed them. You’re looking at close to twice the average US household electricity circuit’s capacity just to run the machine under load.
A cluster of 4 Apple’s M3 ultra Mac studios by comparisons will consume near 1100W under load.
I mean if a hobbyist can run a welder or cnc machine in their home workshop...
> Apple really stumbled into making the perfect hardware for home inference machines
Apple are winning a small battle for a market that they aren’t very good in. If you compare the performance of a 3090 and above vs any Apple hardware you would be insane to go with the Apple hardware.
When I hear someone say this it’s akin to hearing someone say Macs are good for gaming. It’s such a whiplash from what I know to be reality.
Or another jarring statement - Sam Altman saying Mario has an amazing story in that interview with Elon Musk. Mario has basically the minimum possible story to get you to move the analogue sticks. Few games have less story than Mario. Yet Sam called it amazing.
It’s a statement from someone who just doesn’t even understand the first thing about what they are talking about.
Sorry for the mini rant. I just keep hearing this apple thing over and over and it’s nonsense.
I don't think Apple just stumbled into it, and while I totally agree that Apple is killing it with their unified memory, I think we're going to see a pivot from NVidia and AMD. The biggest reason, I think, is: OpenAI has committed to enormous amount capex it simply cannot afford. It does not have the lead it once did, and most end-users simply do not care. There are no network effects. Anthropic at this point has completely consumed, as far as I can tell, the developer market. The one market that is actually passionate about AI. That's largely due to huge advantage of the developer space being, end users cannot tell if an "AI" coded it or a human did. That's not true for almost every other application of AI at this point.
If the OpenAI domino falls, and I'd be happy to admit if I'm wrong, we're going to see a near catastrophic drop in prices for RAM and demand by the hyperscalers to well... scale. That massive drop will be completely and utterly OpenAI's fault for attempting to bite off more than it can chew. In order to shore up demand, we'll see NVidia and AMD start selling directly to consumers. We, developers, are consumers and drive demand at the enterprises we work for based on what keeps us both engaged and productive... the end result being: the ol' profit flywheel spinning.
Both NVidia and AMD are capable of building GPUs that absolutely wreck Apple's best. A huge reason for this is Apple needs unified memory to keep their money maker (laptops) profitable and performant; and while, it helps their profitability it also forces them into less performant solutions. If NVidia dropped a 128GB GPU with GDDR7 at $4k-- absolutely no one would be looking for a Mac for inference. My 5090 is unbelievably fast at inference even if it can't load gigantic models, and quite frankly the 6-bit quantized versions of Qwen 3.5 are fantastic, but if it could load larger open weight models I wouldn't even bother checking Apple's pricing page.
tldr; competition is as stiff as it is vicious-- Apple's "lead" in inference is only because NVidia and AMD are raking in cash selling to hyperscalers. If that cash cow goes tits up, there's no reason to assume NVidia and AMD won't definitively pull the the rug out from Apple.
> A huge reason for this is Apple needs unified memory to keep their money maker (laptops) profitable and performant
None of the things people care about really get much out of "unified memory". GPUs need a lot of memory bandwidth, but CPUs generally don't and it's rare to find something which is memory bandwidth bound on a CPU that doesn't run better on a GPU to begin with. Not having to copy data between the CPU and GPU is nice on paper but again there isn't much in the way of workloads where that was a significant bottleneck.
The "weird" thing Apple is doing is using normal DDR5 with a wider-than-normal memory bus to feed their GPUs instead of using GDDR or HBM. The disadvantage of this is that it has less memory bandwidth than GDDR for the same width of the memory bus. The advantage is that normal RAM costs less than GDDR. Combined with the discrete GPU market using "amount of VRAM" as the big feature for market segmentation, a Mac with >32GB of "VRAM" ended up being interesting even if it only had half as much memory bandwidth, because it still had more than a typical PC iGPU.
The sad part is that DDR5 is the thing that doesn't need to be soldered, unlike GDDR. But then Apple solders it anyway.
> None of the things people care about really get much out of "unified memory". GPUs need a lot of memory bandwidth, but CPUs generally don't and it's rare to find something which is memory bandwidth bound on a CPU that doesn't run better on a GPU to begin with. Not having to copy data between the CPU and GPU is nice on paper but again there isn't much in the way of workloads where that was a significant bottleneck.
the bottleneck in lots of database workloads is memory bandwidth. for example, hash join performance with a build side table that doesn't fit in L2 cache. if you analyze this workload with perf, assuming you have a well written hash join implementation, you will see something like 0.1 instructions per cycle, and the memory bandwidth will be completely maxed out.
similarly, while there have been some attempts at GPU accelerated databases, they have mostly failed exactly because the cost of moving data from the CPU to the GPU is too high to be worth it.
i wish aws and the other cloud providers would offer arm servers with apple m-series levels of memory bandwidth per core, it would be a game changer for analytical databases. i also wish they would offer local NVMe drives with reasonable bandwidth - the current offerings are terrible (https://databasearchitects.blogspot.com/2024/02/ssds-have-be...)
> the bottleneck in lots of database workloads is memory bandwidth.
It can be depending on the operation and the system, but database workloads also tend to run on servers that have significantly more memory bandwidth:
> i wish aws and the other cloud providers would offer arm servers with apple m-series levels of memory bandwidth per core, it would be a game changer for analytical databases.
There are x64 systems with that. Socket SP5 (Epyc) has ~600GB/s per socket and allows two-socket systems, Intel has systems with up to 8 sockets. Apple Silicon maxes out at ~800GB/s (M3 Ultra) with 28-32 cores (20-24 P-cores) and one "socket". If you drop a pair of 8-core CPUs in a dual socket x64 system you would have ~1200GB/s and 16 cores (if you're trying to maximize memory bandwidth per core).
The "problem" is that system would take up the same amount of rack space as the same system configured with 128-core CPUs or similar, so most of the cloud providers will use the higher core count systems for virtual servers, and then they have the same memory bandwidth per socket and correspondingly less per core. You could probably find one that offers the thing you want if you look around (maybe Hetzner dedicated servers?) but you can expect it to be more expensive per core for the same reason.
>The sad part is that DDR5 is the thing that doesn't need to be soldered, unlike GDDR. But then Apple solders it anyway.
Apple needs to solder it because they are attaching it directly to the SOC to minimize lead length and that is part of how they are able to get that bandwidth.
Systems with socketed RAM have had on-die memory controllers for more than two decades. CAMM2 supports the same speeds as Apple is using in the M5.
Except they don't use DDR5. LPDDR5 is always soldered. LPDDR5 requires short point-to-point connections to give you good SI at high speeds and low voltages. To get the same with DDR5 DIMMs, you'd have something physically much bigger, with way worse SI, with higher power, and with higher latency. That would be a much worse solution. GDDR is much higher power, the solution would end up bigger. Plus it's useless for system memory so now you need two memory types. LPDDR5 is the only sensible choice.
> LPDDR5 is always soldered.
No it isn't:
https://www.newegg.com/crucial-32gb-ddr5-7500-cas-latency-cl...
CAMM2 is new and most of the PC companies aren't using it yet but it's exactly the sort of thing Apple used to be an early adopter of when they wanted to be.
It looks like LPCAMM2 is shipping from one vendor and only started shipping in October- that’s a bit quick and early for Apple to adopt.
Is it really useless for system memory or is it just too expensive and no manufacturer has bothered?
> Not having to copy data between the CPU and GPU is nice on paper but again there isn't much in the way of workloads where that was a significant bottleneck.
Isn't that also because that's world we have optimized workloads for?
If the common hardware had unified memory, software would have exploited that I imagine. Hardware and software is in a co-evolutionary loop.
Sort of?
Part of the problem is that there is actually a reason for the distinction, because GPUs need faster memory but faster memory is more expensive, so then it makes sense to have e.g. 8GB of GDDR for the GPU and 32GB of DDR for the CPU, because that costs way less than 40GB of GDDR. So there is an incentive for many systems to exist that do it that way, and therefore a disincentive to write anything that assumes copying between them is free because it would run like trash on too large a proportion of systems even if some large plurality of them had unified memory.
A sensible way of doing this is to use a cache hierarchy. You put e.g. 8GB of expensive GDDR/HBM on the APU package (which can still be upgraded by replacing the APU) and then 32GB of less expensive DDR in slots on the system board. Then you have "unified memory" without needing to buy 40GB of GDDR. The first 8GB is faster and the CPU and GPU both have access to both. It's kind of surprising that this configuration isn't more common. Probably the main thing you'd need is for the APU to have a direct power connector like a GPU so you're not trying to deliver most of a kilowatt through the socket in high end configurations, but that doesn't explain why e.g. there is no 65W CPU + 100W GPU with a bit of GDDR to be put in the existing 170W AM5 socket.
However, even if that was everywhere, it's still doesn't necessarily imply there are a lot of things that could do much with it. You would need something that simultaneously requires more single-thread performance than you can get from a GPU, more parallel computation than you can get from a high-end CPU, and requires a large amount of data to be repeatedly shared between those subsets of the computation. Such things probably exist but it's not obvious that they're very common.
> tldr; competition is as stiff as it is vicious-- Apple's "lead" in inference is only because NVidia and AMD are raking in cash selling to hyperscalers. If that cash cow goes tits up, there's no reason to assume NVidia and AMD won't definitively pull the the rug out from Apple.
These companies always try to preserve price segmentation, so I don’t have high hopes they’d actually do that. Consumer machines still get artificially held back on basic things like ECC memory, after all . . .
Nvidia is definitely preparing for this with the Opensource LLMs they are currently developing
No one cares about Metal in that space, plus CUDA already has unified memory for a while.
https://docs.nvidia.com/cuda/cuda-programming-guide/04-speci...
Can we also stop giving Apple some prize for unified memory?
It was the way of doing graphics programming on home computers, consoles and arcades, before dedicated 3D cards became a thing on PC and UNIX workstations.
Can we please stop treating this like some 2000s Mac vs PC flame war where you feel the need go full whataboutism whenever anyone acknowledges any positive attribute of any Apple product? If you actually read back over the comments you’re replying to, you’ll see that you’re not actually correcting anything that anyone actually said. This shit is so tiring.
You mean like the Neo marketing materials put out by Apple?