Hacker News

freediddy a day ago [ - ]

In the last year, I have bought an M3 Ultra Mac Studio with 512 GB, a Macbook Pro M5 MAX with 128 GB and an RTX 6000 Pro. I have spent around $25k so far, not including electricity. I figured worst case scenario I can sell them in the next year and only take a haircut as opposed to losing my entire investment.

In comparison to just spending for tokens, the tokens would have been much cheaper and much much faster. I've been running against Gemma4:31b, Qwen3.5 and 3.6, and getting local LLMs to solve AMC 8/10 math questions and it's about 10-100x slower than just doing it online. When I tried it with ChatGPT late last year, it took about one night and $25 to solve about 1000 questions. Using my RTX 6000 and M3 Ultra and Gemma4:31b on both, it answered about 40 questions in 7 hours and I haven't checked how good the answer is yet. At 800 watts (600 for RTX and 200 for M3 Ultra) and running for 7 hours, it solved around 40 questions.

At the very least I'm going to try to sell my M3 Ultra if I can find a reliable place to sell it without getting ripped off by scammers.

miki123211 20 hours ago [ - ]

This is, sadly, obvious and inevitable in retrospect.

The two major drivers of inference costs are GPUs and electricity. You can't get cheaper GPUs, but you can make existing GPUs not sit idle, and you do that by utilizing them 24/7, processing user B's request when user A is thinking, and handling many requests in parallel, neither of which you can do as an individual. You can get cheaper electricity... by moving, and it's much easier to move your AI workload than to move yourself.

This is a completely different dynamic than renting houses or apartments, as you can't really rent out the same house to different people at different times of day.

cootsnuck 19 hours ago [ - ]

Yea. LLM inference requires batch processing to have a shred of hope at being cost efficient. Batch processing requires a not so insignificant amount of scale (but probably not as much as people think).

I'm very pro local models, but not to have parity with SoTA frontier models. Just contextually trained small models doing smaller specific tasks.

Trying to run bigger LLMs for an individual user to do big tasks is not going to be a good time.

MichaelZuo 4 hours ago [ - ]

Wasnt this pretty evident to pretty much anyone who knew even a bit about inferencing?

Idk what people were thinking. I’ve never seen anyone offer a plausible way to sidestep batch processing for example.

zozbot234 17 hours ago [ - ]

You can definitely run many requests in parallel as a single user, you just have to be OK with a significant slowdown for any single request. Cloud inference can't reach that ratio of total throughput per hardware cost since they are heavily incented to get the most expensive hardware available and to then minimize latency (and RAM occupation over time) even at the cost of throughput. Running slower inference with cheaper hardware is just not workable in a cloud setting.

PowerElectronix 15 hours ago [ - ]

On top of that, AI providers are also eating a big loss on the service.

tempay 15 hours ago [ - ]

Are they? I only ever see unsubstantiated claims for this whereas I see many justifications that interference is comfortably profitable in isolation.

tomelders 10 hours ago [ - ]

SpaceX's has disclosed that they're loosing $2Bln a quarter on A.I - and rising - in their IPO documents.

Anthropic told the Department of War-nee-Defence that they'd made $5bln total, which is a lot LOT less than what they're spending.

We'll see what's in OpenAi's IPO later this year I guess. I'll be very surprised if they're losing less that $100bln a year.

koliber 3 hours ago [ - ]

Is it capex of training new models and hiring people for 250mln pay packages? Or is it opex running inference?

ai_fry_ur_brain 15 hours ago [ - ]

Its basic math, go calculate max sessions for a certain tps on any hardware. Session# * tps * 86400 (secs in a day) * 30 days.

You'll realize real quick its not profitible. You cant just say things you don't like to hear are unsubstantiated without verifying.

Not to mention, subscriptions.. $2mm in GPUs being given out for 5 hrs a day at a cost of $200 a month.

I could easily say that everyone who says its profitible is msking unsubstantiated claims lol.

fauigerzigerk 14 hours ago [ - ]

>Its basic math

Yes, once you have modeled the problem correctly and you know all the input parameters. This is not that: Session# * tps * 86400 (secs in a day) * 30 days.

I don't think there is enough public information to check Anthropic's claims regarding inference profitability. It depends not just on unknown technical factors but also on agreements they have with other companies.

ai_fry_ur_brain 6 hours ago [ - ]

I agree that we dont know how expensive SOTA is. But yes my math should give you the max amount of tokens you can sell per month, and its not remotely profitible for most of the larger open source models (at their current pricing). Im not sure why a 10x larger model that is more in demand would be profitible when its only 5x the price.

Its possible you could pay off hardware for Kimi 2.6 after maybe 2-3 yrs (by providing low tps / high concurrency) but you're now out of warranty and have been running your machines full throttle 24/7 for 2-3 years.

This is why moonshot attempted to double the price when they released 2.6 but then it got driven down by North American capital subsidies.

mr_mitm 14 hours ago [ - ]

We should specify which subscription plan we are talking about. You seem to be talking about the Anthropic Claude Max plan. I think it's consensus that these flat rate type of subscriptions are loss leaders, as they come with restrictions how you can use the API via T&C, namely only with Claude Code et al. They are meant to hook developers into their products.

Shouldn't we compare the API pricing, where we pay per token? The whole point of local inference is that we don't have any restrictions regarding product use or time limits, so it would only be fair if we compare it to a plan that offers the same. And even that is only a first approximation, because the commercial models are usually much more capable than the open weight models.

mbesto 9 hours ago [ - ]

> I could easily say that everyone who says its profitible is msking unsubstantiated claims lol.

And people who don't understand the difference between capex and opex are making uneducated claims. It's not basic math.

Running an inference data center is a mix of variable and fixed costs. The fixed costs are currently in the billions of billions of dollars for pretty much any investment in this space. Many of those fixed costs have (currently) unknown refresh cycles. So, unless you have access to the financial books of these companies it's currently just speculation whether inference is profitable.

adastra22 15 hours ago [ - ]

You got numbers? Because it seems perfectly possible to me. OpenAI and Anthropic’s marginal cost for inference is certainly far less than their API pricing.

callmeal 15 hours ago [ - ]

See: https://www.wheresyoured.at/ He's been "numbering" for quite a while now.

tempay 14 hours ago [ - ]

Everything there is extremely speculative and I don't see anything that contradicts that inference itself could be profitable at massive scale. See https://youtu.be/xmkSf5IS-zw for example.

If the companies as a whole are destined to be profitable, or worth their valuations is a very different question. The only people who can truely answer that have time machines.

ai_fry_ur_brain 15 hours ago [ - ]

How can you say that with such certainty? You have no idea what it costs to run a 10T parameter model at extremely high concurrency.

These 1T param models running at <$3.00 per 1mm are certainly not profitable.

adastra22 14 hours ago [ - ]

Because I’ve looked at what it would cost my company to self-host a SOTA sized model. For us it wasn’t worth it because the hardware is all bought up by frontier labs and we can’t get any supply. But if we could, at the prices they’re paying, it would pay for itself in 10-ish months. I assume further that they have economies of scale on top of what I was estimating.

brightball 8 hours ago [ - ]

To some degree I think there's a hope that it becomes like a gym membership. If everybody used their membership, the gym would be too crowded. It's all of those memberships that people feel like they need to have but don't use where the extra profit comes in.

As long as the power users are paying per token, everything is good.

krupan 6 hours ago [ - ]

Really? This is what we expect from this amazing world changing technology? People will sign up for it and not use it? Good business plan, how can I invest? /s

brightball 5 hours ago [ - ]

Just speculating on the math.

exploderate 7 hours ago [ - ]

Especially since their costs might be multi-year investments. It's too early to judge the quality of those investments.

solumunus 15 hours ago [ - ]

Supposedly Anthropic just reported that they’re operationally profitable. So maybe not?

akho 14 hours ago [ - ]

"operationally" implies that capex (which I would assume includes datacenters, gpus, and r&d) is not in. So the big news is that they can now pay for electricity and sysadmin.

camdenreslink 9 hours ago [ - ]

I believe they also excluded stock-based compensation from their calculation, which could easily tip them in the non-profitable direction.

adrianN 19 hours ago [ - ]

Historically it was not uncommon for beds to be rented out to multiple people.

bredren 15 hours ago [ - ]

The word for this type of boarding is “flophouse.”

This is the type of place one might be “waiting for the other shoe to drop.” Which carries a variety of potential meanings in this moment of AI.

Tangentially related: Mack and the boys lived in the “Palace Flophouse and Grill” in Cannery Row.

I suppose I must have looked up flophouse when reading all the Steinbeck I could get my hands on and it’s stuck w me.

eecc 16 hours ago [ - ]

It is unfortunately still common practice among irregular agricultural workers in many parts of the world (I’m Italian so I definitely remember news about busts in southern Italy)

consp 16 hours ago [ - ]

See military submarines, for a modern version.

AdamN 12 hours ago [ - ]

Yeah there are good accounts of this in Down and Out in Paris and London and also one of Hemingway's books - forgot which one.

Unit327 16 hours ago [ - ]

It also doesn't help that they probably sell tokens below cost.

graemep 13 hours ago [ - ]

High usage seems to change the economics. The author of the article had a payback period of about 14 months which is excellent by any standards and an order of magnitude better than rent vs buy for a house in most places.

dpark 7 hours ago [ - ]

> You can't get cheaper GPUs

You absolutely can. OpenAI et al are paying a fortune for GPUs but they are not paying retail prices.

The entire business model of retail is to sell above cost.

jon-wood a day ago [ - ]

I’m not usually one to ask this because learning to do a thing can be fun, but why exactly have you spent 25 thousand dollars on getting an LLM someone else made to answer maths exam questions?

nickthegreek a day ago [ - ]

The cost is obviously not that big of factor for OP as it might be for others. It's actually refreshing to hear the candid viewpoint that he expresses here.

freediddy a day ago [ - ]

25k is definitely a lot but I did the risk analysis and I figured worst case I would lose a 1000-2000 after a year of playing around with it, so I look at it more like renting (I'm going to keep the Macbook Pro no matter what since I needed a new one).

cronin101 a day ago [ - ]

Nitpicking, but the worst case of spending $25k is unforeseen circumstances that write off the entire asset. I don’t think -$2000 is a conservative enough figure for standard depreciation either (a lot can happen in a year)

throwawaytea a day ago [ - ]

Either I don't understand the used apple market.. or I agree this is crazy. Someone spends $25k on new hardware, waits a year, and expects to sell it for $23k? Unless the ram issues save him, and cost of new goes up, I don't see how that was going to work.

cortesoft a day ago [ - ]

Well, Apple is literally not offering the M3 512GB studios currently. You can’t even back order one.

They are selling on EBay for over $20k, used.

digitaltrees 20 hours ago [ - ]

It’s hard to know if any of these eBay listing are real or actual sales. Lots of scams.

gbgarbeb 18 hours ago [ - ]

The ones sold for $25k from established sellers are legit. Filter by "sold."

The 0-reputation account in Spain selling an M3U 512GB for $4200 is 100% fraud.

digitaltrees 17 hours ago [ - ]

oh young grasshopper, I see you dont know that money launderers love the ebay hype cycle. Its REALLY common on high dollar hot items to have phantom transactions where parties are on both sides of the transaction to clean illicit money. The high price tag and high volume amount of transactions hides the illicit signal. I have tried to buy a few of these mac studios only to have the transaction cancelled because I wasnt the dirty money on the other side.

bionsystem 11 hours ago [ - ]

Funny, they have the "honesty" to cancel the transaction and not take your money, just to keep their ebay reputation high ?

razakel 6 hours ago [ - ]

eBay and PayPal almost always side with the buyer.

rithdmc 10 hours ago [ - ]

The transaction was cancelled? Sounds like you weren't defrauded.

dpark 7 hours ago [ - ]

GP didn’t say they were defrauded. They said the listing was a cover for laundering money.

rithdmc 6 hours ago [ - ]

"weren't scammed" might have been a better choice of words, which they said one post up.

dpark 6 hours ago [ - ]

Ah, I see what you mean now.

gbgarbeb 15 hours ago [ - ]

Bizarre. They don't care that eBay takes 14%?

swiftcoder 14 hours ago [ - ]

14% seems like a pretty low fee to clean drug money, if we're being honest

rithdmc 10 hours ago [ - ]

Traditional money laundering loses upwards of 40%, so hell yeah.

encom 5 hours ago [ - ]

What am I going to do with 40 subscriptions to Vibe?

jcelerier a day ago [ - ]

You could almost sell a RTX 3090 for more today than what it cost brand new when it came out six years ago

conradkay a day ago [ - ]

It's still very contrarian to expect GPUs won't depreciate rapidly. Yes 3090s were a good investment then, but way worse than just buying Nvidia stock directly

freediddy 8 hours ago [ - ]

How did we go from "I expect to lose only 1000-2000 if I try to sell my used equipment" to "you should have just bought NVDA to get a better return." The point wasn't the better return, the point is that I wouldn't lose all my initial investment if i decided I wanted to sell it.

And the fact of the matter is that in 2026, all electronics has gone up, not down, and sought-after GPUs have gone up in price in the used market.

latentsea 19 hours ago [ - ]

Waiting for them to come down any day now. Been waiting since 2017.

oblio 15 hours ago [ - ]

First it was crypto, now AI. Just because the market can stay irrational for very long doesn't mean crashes don't happen. What nobody knows is when.

baq 11 hours ago [ - ]

at some point you have to accept that the market is actually rational

10 hours ago [ - ]

[deleted]

oblio 10 hours ago [ - ]

Yup, just like crypto and tulips were rational behaviors for the global economy. Or investing 5% of world GDP in half-baked AI is.

baq 10 hours ago [ - ]

wanting to get rich quick is completely rational if you ask me ;)

oblio 7 hours ago [ - ]

Only if they do get rich, if they lose money, it's stupid. Technically stupidity is also rational? The bottom of rationality? :-))

hnav a day ago [ - ]

This is the case in lots of markets, e.g. look at used cars, luxury goods and more. Some of it is driven by inflation/the rapid devaluation of the dollar. General and AI-adjacent compute in particular hasn't come down in price in a long while.

majormajor 20 hours ago [ - ]

Apple products have had relatively high resale for a while. Only losing 8% in a year is probably extra unusual, and 1-year-old wasn't really ever the sweet spot, but a "sell used privately after a few years, roll onto the new one" has been a relatively common play.

Doing this particular one is definitely expecting the market squeeze to continue. "Worst case" is back to more "normal" depreciation. Where I'd expect to only be able to recoup more like 18k. But... if you look at GPU prices the last 3 years... it's not a crazy assumption that it won't drop that fast.

iPhone example since those are easiest to find in quantity: new iPhone 16 Pro Max for $1200, Gazelle would want $866 for "execllent" condition. Lost ~28% for one-model-back. iPhone 15 Pro Max, though: excellent priced at $667 here, only down another 23%, and gives you basically half-priced-upgrade if you can sell it for that and roll into the newest.

So to have never-more-than-one-model-old rough estimate at today's value-holding you'd be out $3600 for three new phones, with getting 1732 of that back, or 1868 for it (with a $334-per-year incremental cost of upgrade).

For never-more-than-two-models-back you'd be out $2400, getting back $866, for net $1534 spend, with a $167 incremental per-year upgrade cost once you buy the first one. Pretty good if you keep the phone in excellent condition and are happy to budget a bit over $10/month to be on a every-two-year upgrade train.

Well, you'd also eat the tax...

https://buy.gazelle.com/products/iphone-16-pro-max-256gb-unl...

https://buy.gazelle.com/products/iphone-15-pro-max-256gb-unl...

consp 16 hours ago [ - ]

What you describe is something people with enough to make the first purchase and eat the cost when it breaks have been doing for years e.g. with cars. People on the lower end of money scale tend to use products for well over their economic lifetime saving way more and buying a cheap replacement if it breaks. Notable exception as stated being phones for some reason as it likely is a status symbol for more people in a [insert preferred external sexual characteristic] measuring contest.

bionsystem 11 hours ago [ - ]

> "Worst case" is back to more "normal" depreciation.

I would absolutely not count on that, if and when it drops it will drop hard.

UltraSane a day ago [ - ]

A lot of expensive things hold their value well. I have a friend who is really into telescopes and he now owns a $100,000 telescope but he didn't directly buy such an expensive scope. He started out with much cheaper ones and was able to sell them for about what he bought them for to help fund more expensive ones over 20 years. It is really interesting.

balls187 17 hours ago [ - ]

Thank god. I’ve been waiting to have a reason to tell someone I collect fountain pens.

Computers depreciate because they are obviously being supplanted by newer better models—until they become vintage and then move into collectibles.

globnomulous 9 hours ago [ - ]

I wouldn't call this nitpicking. This is how people who are careful with money think. I learned embarrassingly late to stop justifying purchases by making predictions about future returns. I treat everything as having zero value as soon as I purchase it. Thinking otherwise is, for me, always a dangerous rationalization -- always a craving that's trying to outmaneuver sense.

xienze a day ago [ - ]

> I don’t think -$2000 is a conservative enough figure for standard depreciation either (a lot can happen in a year)

We aren't exactly in "standard" times and haven't been for quite a while. Even five year old graphics cards are worth more today than they were just a year ago. Things will obviously depreciate at some point, but you gotta throw your existing notions of how quickly and how much hardware will depreciate out the window. There's just been too much money dumped into AI for a "well I guess this won't ever pan out, let's dump all this hardware to recoup our costs" moment to happen and tank the price of everything suddenly IMO.

And that's not even getting into the other geopolitical stuff going on right now. Strange times.

8 hours ago [ - ]

[deleted]

21 hours ago [ - ]

[deleted]

BLKNSLVR a day ago [ - ]

Aren't things like this seeing 'negative depreciation' these days?

manyatoms a day ago [ - ]

"inflation"

jiggunjer 21 hours ago [ - ]

"appreciation"

margalabargala 20 hours ago [ - ]

Sure, they took a gamble that they wouldn't be able to sell it used.

If you are able to tie up $25k for a few years just for shiggles, you clearly are able to make do fine without that money and if lost it would be at worst annoying, not catastrophic.

jaxn a day ago [ - ]

I assume he is calculating the loss as depreciation - what they would have spent on cloud bills if they hadn’t been doing this locally.

kristopolous a day ago [ - ]

I mean whatever. It's workstation/server class hardware, that's how much it's been for a long time

MagicMoonlight a day ago [ - ]

Was this risk analysis just AI slop?

blitzar a day ago [ - ]

I think op would make a really good pope too.

https://news.ycombinator.com/item?id=48118672

a day ago [ - ]

[deleted]

girvo a day ago [ - ]

I didn't spend that much, only $6500 AUD for a GB10 based Asus GX10 which is even slower than OPs, but I spent that because it makes for a great learning platform. Theres not much else that lets me fiddle with 128GB of RAM for my graphics processor, and it's quite lovely to be able to run things as long as I like without worrying about my cloud instance being shut down.

It's not financially a good idea: renting really does beat owning, and cloud beats both if you're only running inference on these machines. But I'm not just doing inference, and as a thing I can do silly stuff on to learn, it's hard to beat!

a1o a day ago [ - ]

When you say you are not just doing inference, you mean you are also training your own llms? I am curious what other things can be done.

girvo 16 hours ago [ - ]

Fine tuning, and yeah training my own, experimenting with architectures and learning how it all works. Been a lot of fun

nostrebored 20 hours ago [ - ]

$6500 AUD can get you a good chunk of B200 time on any of the GPU neoclouds :)

girvo 16 hours ago [ - ]

Less than I expected, though! And I get to run this all through the night

I do still use Vast and Runpod for things too, but it’s much nicer to test a fine tuning run here to make sure I’m in the ballpark

I also did literally say “It's not financially a good idea, renting is better than owning” so I’m confused why I have two people telling me that

Also it’s just far more fun to play with something tangible to me :)

andoando 19 hours ago [ - ]

You could just rent a bare metal server with those specs

girvo 16 hours ago [ - ]

Yes I could, but that is annoying because of spot pricing and having my instance shut down, and it has fluctuating prices

It’s also annoying because then I need to make sure my little “lab” setup is well automated, and I’m lazy :)

Also, I literally said “ It's not financially a good idea” so I’m confused why you think I don’t know that.

adastra22 15 hours ago [ - ]

Spot pricing and instance availability don’t apply to on metal hosting. You’d have your own machine dedicated to your own use only, at a locked in price.

girvo 12 hours ago [ - ]

> renting really does beat owning, and cloud beats both

nullsanity 18 hours ago [ - ]

[dead]

hnuser123456 a day ago [ - ]

Privacy and offline operation are valuable or non-negotiable in some cases, but the difference is pretty categorical between what can run on a single card and what can run on a DGX GB200 NVL72 cabinet. Doesn't mean it's not worth seeing how far local models can be pushed. Not every problem needs a senior engineer.

dylan604 a day ago [ - ]

I know it's one of those "if you have to ask" situations, but curiosity got the better part of me. Here's the search assist response:

"The DGX GB200 NVL72 AI server costs approximately $3 million per unit. This system includes 72 Blackwell GPUs and 36 Grace CPUs, making it one of the most powerful AI servers available."

The search assist actually credited a source used with: https://www.tweaktown.com/news/98292/nvidias-new-gb200-super...

That $25k spend by GGGP seems like nothing in comparison. That's ~1/3 of one chip in that cabinet. God gawd I'm old and out of touch with modern AI data centers.

nl 16 hours ago [ - ]

By comparison, the Colossus 1 data center had 32,000 GB200s (as well as 150,000 H100 GPUs, 50,000 H200 GPUs), and they are bringing another 110,000 GB200s online (although this might be Colossus 2?)

There are bigger data centers than Colossus 1 around too.

There is a reason NVidia is the most valuable company on the planet.

https://en.wikipedia.org/wiki/Colossus_(supercomputer)#Curre...

TheOtherHobbes a day ago [ - ]

It's The Circle of Computing Life. The pendulum swings between centralised mainframe timesharing-for-hire and desktop individuality.

We've been in a centralised phase for longer than usual - first cloud everything, then AI - but at some point in the next decade prices will crash and a market will appear for personal, local intelligence.

zozbot234 a day ago [ - ]

> the difference is pretty categorical between what can run on a single card and what can run on a DGX GB200 NVL72 cabinet.

A better way of putting it is that you can run plenty of things on a single ordinary system, but you may be disappointed at the performance. Generally, you can't expect inference to be as quick as with cloud for SOTA-like models. You have to run smaller models for quick replies, and large models with a lot of real-world knowledge for less time-critical inference, possibly batching many requests simultaneously to improve throughput.

_the_inflator a day ago [ - ]

One year ago finetuned local LLMs had a significant edge over ChatGPT or Claude. Look up in YouTube all the DIY videos testing LLMs on their own machines with different setups.

Remember: one year showed up to be a gigantic leap in regards to quality of results and innovation in the AI space. Agents weren't really a thing and vibe coding wasn't even invented as a term because the top notch tools at the time were lousy, with lovable being the frontrunner with its - in my view - sorry Tailwind recombination tool shaming AI to do the work.

Then fall hit 2025 hit us, new year's eve and suddenly there was such a massive surge of innovation and competition with ChatGPT Codex suddenly showing up.

Remember: one year ago many now commonly used tools weren't yet available like Nano Banana or Codex.

"The 25k are so vast" - Yes, and no. For example, if the machine is bought for business usage I can deduct the costs from taxes. This roughly amount for 50% of the financial burden.

So I jokingly use to say, that I pay only half the price for my Apple business machines. And yes, I am strict in this regard. Business means business. No private emails etc. nothing on my company computers.

Maybe there are other options as well to reduce the financial expenses the dude mentions, but it doesn't seem so.

I would also go for leasing, this way already the monthly payments can be deduced and I don't need to buy and maybe resell the machine.

Apple is a luxury good. Without business usage or at least partly using it for business as well as private (mixed usage in tax reports) I wouldn't buy the devices or think twice.

Apple under Cook evolved into a Gucci like luxury brand, that is more and more a rip off than quality delivered, especially considering the latest OS updates for Mac, iOS and iPad. Apple is a mess, following Microsoft Windows' footsteps happily, because the CEO is as has been correctly assessed, no product guy.

But I stop with my rant here.

Always try to use tax deduction as leverage for your computer expenses. Every citizen should invest in basic knowledge about that.

Even a 10-20% professional usage for work (mixed usage) gives you a noticeable advantage over normal pay.

freediddy a day ago [ - ]

It's just a project I'm working on. I'm working on projects where AIs are processing and classifying large amounts of data that would be a lot of work for humans to do.

wutwutwat a day ago [ - ]

I think of LLMs as being well equipped for handling dynamic data or adapting to unforeseen circumstances well (random code requests, website's ever changing layouts, typos, non-standard formatting in docs, groking out important info, etc), but math problems are be definition a very specific set of instructions to run, so is the overhead and "thinking" aspect of a LLM/AI even needed here? I'm genuinely curious, btw, I'm not asking sarcastically. Can't these math problems just be yanked from some test file and rapid fired directly at a gpu/compute unit?

freediddy a day ago [ - ]

> Can't these math problems just be yanked from some test file and rapid fired directly at a gpu/compute unit?

Yes this is exactly what I'm doing. I isolated the actual math question, and then sent it to my two servers to process and that's what's taking 10m+ to return. I'm asking them to solve the question and return the full answer along with their steps. I care about correctness so taking time is okay but I can't use 10m per solution.

jachee a day ago [ - ]

Nono, parent was asking “They’re bad and inefficient at that, so why have an LLM do math? Why not just use some code and the CPU/GPU that’s already good and efficient at basic math?”

pjc50 10 hours ago [ - ]

This is making me feel a lot better about my plan to lease a $25k EV simply because it's available at a massive discount. I'll probably end up using less electricity, too.

Retric a day ago [ - ]

That hardware is costing him ~1$/hour over 3 years. Presumably having it answer math questions was a tiny fraction of what he was using it for.

iwontberude a day ago [ - ]

I’ve spent twice that on hosting movies and tv for Plex, so… I think they are worthy of my praise. What a healthy outlet for money.

root_axis a day ago [ - ]

You spent 50k for plex hosting? Why so expensive?

iwontberude a day ago [ - ]

Half a petabyte of RAID6 is the biggest line item, then the redundant 40gb networking and compute follow closely. I have a lot… too much even?

swiftcoder 14 hours ago [ - ]

does one really need 40gb networking to stream bluerays?

simplyluke 9 hours ago [ - ]

I am (clearly) not as far down the rabbithole as the commenter you're replying to, but almost certainly not. Streaming 4k blueray is on the order or ~100Mb/s, which means on a LAN bog-standard gigabit ethernet and associated networking hardware would be more than sufficient.

This is taking a hobby to its extremes, in much the same way that a $5k boat and $500k boat let you catch the same fish.

iwontberude 6 hours ago [ - ]

It’s about being able to rapidly move files between the arrays and future proofing.

iwontberude 6 hours ago [ - ]

You are totally right, it’s mostly just for backups and transfers to rebalance data.

iamacyborg a day ago [ - ]

That’s a lot of blurays…

iwontberude a day ago [ - ]

I found them dumped out in the “street” in a place ordained by law as public domain. So I just grabbed up the media and use it in private.

marai2 a day ago [ - ]

How many Blurays are we talking about?

LtdJorge a day ago [ - ]

Can't reply to the other poster, but I have 4K HDR Blu-ray copies from discs I found in the street too, which are more in the 60GB ballpark.

yard2010 7 hours ago [ - ]

In what kind of "streets" you guys are hanging around?

carlob a day ago [ - ]

500 TB/25 GB = 20.000

if some have more than one layer it could fewer but that's the order of magnitude

bronson 21 hours ago [ - ]

If each Bluray is 2 hours long, that's 4.5 years of nonstop watching.

cdud3 8 hours ago [ - ]

Just parallelize watching.

sfn42 2 hours ago [ - ]

I have 50 agents watching blurays for me 24/7

yard2010 7 hours ago [ - ]

Dam this is just art in this scale

wotsdat a day ago [ - ]

[dead]

ActorNightly 21 hours ago [ - ]

Because buying Macs is not about performance, its about feeling like you are rich.

That money could have been spent on way more bang/buck performance in the form of a set of 4 graphics cards.

Also I would probably put the odds 70:30 that Apple marketing is astroturfing on HN from the amount of posts about running llms on Macbooks, because in reality, the inference speed of any decent llm is unusable on a Macbook despite the ability to fit it into RAM.

gbgarbeb 18 hours ago [ - ]

40-80 tok/s is unusable to you? Ok.

If you like having a box with 8-12 fans blasting hot air and noise into your office all day, nobody's stopping you.

topaz0 20 hours ago [ - ]

Or it could have had way more bang/buck by feeding a family of real brains for a year or two

yard2010 7 hours ago [ - ]

Excuse me for this comment, really, but I can't comprehend the absurdity, some people are buying GPUs when other people have no money for insulin so they literally die. I don't mean anything towards op or gp, quite the opposite I'm truly happy they have this kind of freedom, it must feel really nice, I just hate this game so much.

skiing_crawling a day ago [ - ]

I got an RTX 6000 pro too. I like running locally, I've learned a lot more than if I had used an API and there's less worry about overspending tokens. I accidentally spent $100 on claude api in like 2 days because I didn't know what I was doing.

The problem is that while one these gpus is a huge improvement over a laptop or a single 3090, you very quickly wish you had more. I would buy a second one, but I did the math and realized that with the current crop of models, 2 Blackwells doesn't buy me any new capability that I didn't have with one. So I would need a 3rd one. And when I buy a 3rd one I will feel like I want to running a higher quant, so then I will want a 4th.

arjie a day ago [ - ]

You can fit Deepseek 4 Flash on two with TP 2 and 6 different streams at 65k context. 150 tok/s

CamperBob2 a day ago [ - ]

A pair of RTX6000 cards will give you a good performance boost due to tensor parallelism, though. I haven't tried the newest predictive quants but I see about 35 tps when running the 8-bit Qwen 3.6 27B model on one board and about 50 tps on two. Probably could come close to 100 tps on an optimized setup with the latest GGUFs.

Also, the 4-bit quants of MiniMax 2.7 will run at 100 tps or so with two cards, which is pretty decent. It doesn't go any faster at all with 4 GPUs from what I've seen, so if you don't actively need 384 GB of VRAM, 2x RTX6000 is a good place to be.

skiing_crawling 20 hours ago [ - ]

You can get 70-80 tps on qwen3.6-27b f16 with MTP on a single card

Melatonic 15 hours ago [ - ]

What kind of machine did you build around it ?

glaslong an hour ago [ - ]

Same as it ever was with "cloud" eh? The advantages of small-scale on-prem are never cost or quality, they are strictly privacy and sovereignty. No one can rug pull you with a week of Claude regressions. No one has access to your sensitive data.

vessenes 6 hours ago [ - ]

I don't think this changes the final conclusion - but have you considered calculating against depreciation -- i.e. figuring out how much your M3 ultra is worth today, and only charging yourself for the delta? In my mind you might even have made money on the hardware.

tpurves a day ago [ - ]

>> find a reliable place to sell it without getting ripped off by scammers.

This is a real problem and why I've just about given up on ebay or fb marketplace, esp for computers. If you are in Canada though sellit9.com is a great solution to having to deal with sketchy buyers.

tracker1 a day ago [ - ]

If you're in a decent sized city, you should be able to find a local buyer on Craigslist or FB Marketplace... Beyond that, for higher value, smaller items like your M3 Ultra, I would talk to your local police department and/or library to see if you can do the exchange there. Larger libraries usually have a police officer on site or nearby, and the PD office near you may also provide a "safe" exchange location... I'd bring a monitor/keyboard/mouse so you can demonstrate the system working properly.

YMMV but between your nearest PD office and Library, you should be able to use one or the other for your exchange of goods/money. The biggest thing I've sold is a mid-range video card during late covid (I managed to get a better one via newegg shuffle) so I sold the old one (RX 5700XT -> RTX 2080) to make up the difference a bit. I just did the exchange at the Starbucks near me for that.

thefounder a day ago [ - ]

Something is very wrong in some countries if you have to get police protection to sell a f* computer. I get it’s on the expensive side but still….

michaelt a day ago [ - ]

See e.g. https://www.murphytx.org/843/Safe-Exchange https://www.ottawapolice.ca/en/community-safety-and-crime-pr...

Police "safe trade zones" are basically a parking space outside a police station, with a sign.

tracker1 a day ago [ - ]

You don't have to... but it's a matter of a safe location for both parties. If it was more expensive, I'd probably work through a broker (like a car or house).

The buyer doesn't know who the seller is, and vice-versa... the level of trust you can bear depends on how much you're willing to lose. My advice is only in that there are safe venues you can use to make such an exchange.

fc417fc802 a day ago [ - ]

Not really. Every country has a nonzero number of criminals. It's entirely a matter of the risk/reward tradeoff. A small consumer item over $10k is well into dangerous territory.

mixmastamyk 20 hours ago [ - ]

Are we talking about a cash transaction? If so >$10k is dangerous as the police may want to steal it themselves.

If it is an electronic payment, I'm not sure how completing the transaction in front of a police station will help any. Well, it will help the buyer to see it working, but the seller gets no additional protection besides seeing "a person."

pwg 8 hours ago [ - ]

> If it is an electronic payment, I'm not sure how completing the transaction in front of a police station will help any.

That's not the point of going to the "police safe exchange zone".

The point is to hopefully prevent the possibility of the buyer showing up with a .38 in hand, and demanding to be given the easy to fence "item" unless the seller wants to get a .38 slug embedded in their gut.

The risk of a "hold up" increases with dollar value and with items that are easier to fence.

542354234235 7 hours ago [ - ]

I love that it never occurred to you that the "buyer" could just steal the item. Be safe out there.

mixmastamyk 5 hours ago [ - ]

Ask a question, get a condescending answer.

fc417fc802 17 hours ago [ - ]

Why would the seller worry about ... himself? The seller worries that the buyer might not have any intention to pay for the small, expensive, easy to fence item to begin with. Conversely, the buyer worries that the seller might not have brought an item at all.

digitaltrees a day ago [ - ]

I have three m3 512gb units and want a fourth to run an exo set up. Like you, I am worried about scammers. Let’s discuss if you still want to sell.

https://calendly.com/ryanwmartin/open-office-hours

jrmg 9 hours ago [ - ]

If you run it in the winter the electricity is “free” because it’s replacing a portion of whatever else heats your house.

beernet 6 hours ago [ - ]

Yep, the great theoretical promise of local models remains theoretical, no matter how much die hard-engineers want to push it...Who would have thought, right?

bethekind a day ago [ - ]

Which of these has been the most productive for you? Sounds like you've enjoyed the RTX6000 the most?

freediddy a day ago [ - ]

RTX 6000 is some-what obviously my fastest card but my biggest problem with the RT 6000 is the immense heat. The GPU itself is almost 200F and the exhaust from the fans itself is over 150F. I'm worried that my hard drives are going to fail. I was told that the GDDR7 is even hotter than the GPU which is surprising to me.

After my last run, I'm going to wait for the new case I ordered to come in and cannibalize my kid's PC that we built beginning of this year to form an entirely separate computer. And then figure out better ways to deal with the heat, especially with summer coming up. I'll have to play around with undervolting and running vents directly outside my house to see if that helps.

vladgur a day ago [ - ]

From my failed and expensive affair with GPU mining 5 years ago, You can get a great heat dissipation outcome by using an open case with a lot of directed fans at the expense of a bit of dust and lots of noise

theYipster a day ago [ - ]

That's about what my OC'd and watercooled 4090 runs at. The cards are designed for it. Only problem I have is when sitting next to the computer under load -- I either have to open windows or blast the AC. Too bad I don't live in a cold climate -- that 60c heat output would come in handy :)

ChoGGi 20 hours ago [ - ]

> Too bad I don't live in a cold climate -- that 60c heat output would come in handy :)

Used to overclock back in the day during winter with an intake duct rigged to suck in outside air, best thing about -30c :)

Silagi 10 hours ago [ - ]

I've always thought about doing something like this in the Midwest US, but was always a bit nervous about condensation damaging the components over time; did you run with that sort of setup consistently, or only when pushing high scores? Ever run into issues with components failing?

ChoGGi 10 hours ago [ - ]

That was 25 odd years ago, less sensitive hardware and cheaper... Nothing that failed though, did have some sketchy moments with condensation yeah :)

Not consistently, I did start using petroleum jelly till I upgraded and found out that wasn't very fun to clean up.

dylan604 a day ago [ - ]

Since you are not running realtime 3d grafix, could you put the card in an external chassis so the heat is not in the same box as the SSDs?

ericd a day ago [ - ]

I take it this wasn't the half-wattage Max Q version with blower fan?

arjie a day ago [ - ]

All of these have appreciated in value. How much are you looking for the Ultra?

freediddy a day ago [ - ]

I've seen a lot of sales on eBay for over $20k, but I don't know if I believe it. Plus the lack of seller protection and the prevalence of scams on eBay make me too hesitant to actually want to risk it so I don't know what to do haha

arjie a day ago [ - ]

Haha, yeah, it's about $23k or so. Should be twice the price what you bought it for if you got it last year. Tbh I don't know why. The RAM is large but the bandwidth and the compute isn't nearly enough. You can fit DeepSeek V3 on it quantized but inference is like 10 tok/s. Honestly, you'll be able to sell it locally for that in cash, and I would in your place.

I saw your heat comments about the RTX 6000 Pro as well. I bought a few of them recently and I'm running 2 of them in a 2U case in a colo. You need a lot of active airflow to keep them cool. Mine range from 23 C to 80 C.

a day ago [ - ]

[deleted]

LarsDu88 a day ago [ - ]

Well if it makes you feel better those frontier LLMs are all technically taking a big loss, and they may all be in your shoes after a few years.

throwaway2037 19 hours ago [ - ]

    > if I can find a reliable place to sell it without getting ripped off by scammers.

I don't follow this last part. What is the scam they try to run?

gbgarbeb 18 hours ago [ - ]

A buyer can claim they never received it or that the box was empty, thus receiving a full refund.

For something listed at $25k I would not list on eBay at all. eBay corporate will pocket $3400 in fees and will also dock you local taxes on the $25k.

onlyrealcuzzo a day ago [ - ]

> I figured worst case scenario I can sell them in the next year and only take a haircut as opposed to losing my entire investment.

It's going to be a non-trivial haircut. This stuff depreciates pretty fast.

michaelt a day ago [ - ]

Bizarrely, I brought a GPU new in Jun 2024, and there are sold ebay listings saying the used GPU is worth 4% more today.

Of course, this is an unusual state of affairs; I see my GPU purchase as consumption, not investment.

jma24 20 hours ago [ - ]

Better sell it fast before the M5 ones come out.

plasticsoprano a day ago [ - ]

You'll probably make a profit by selling them today. I bought a M1 Max Studio with 64 GB last year off FB Marketplace for $1000 and today I'm seeing numerous 32 GB M1 Maxes for $1200-1500.

freediddy a day ago [ - ]

Yes the prices on eBay for the Mac Studio are all over the place, but I've seen sales for over $20k. I don't know if I believe it but there's enough to make me think if I can sell it for that price it would be worth it, but eBay has basically no seller protection so I'm not willing to take that chance.

balls187 17 hours ago [ - ]

I’ve had the best luck selling in Craigslist. Every other platform has been sub par.

ahmadyan a day ago [ - ]

If you are in the bay area, i'm happy to buy that M3 Ultra from you, i've been unsuccessfully looking for one and can't find any.

mountainriver a day ago [ - ]

Running LLMs on Macs is still terribly slow. They simply lack the optimizations other platforms have.

An RTX 6000 pro Blackwell is a pretty good card

speedgoose a day ago [ - ]

A M3 ultra mac Studio can run models that do not fit in similarly priced computers with multiple Nvidia GPUs. And it will use a lot less electricity while still having good enough performance. Except the pre-filing perfs that are quite poor on the M3.

ttoinou a day ago [ - ]

M5 pro 48GB should be good and future proof

thefounder a day ago [ - ]

If you buy Mac get at least 256GB ram otherwise just buy a bunch of nvidia cards. It really does not make sense otherwise if you are looking for performance / $. The mac (studio) is unique as it has more ram than the alternatives(I.e consumer nvidia cards or spark stuff) so it can fit bigger models but otherwise its performance is worse.

de6u99er a day ago [ - ]

You definitely want to get rid of your M3 Ultra before the M5 Ultra get officially announced.

digitaltrees 17 hours ago [ - ]

Give the global memory shortage, the m5 will be both delayed and restricted to lower ram tiers, I dont think we will see a 512gb ram model until 2030

tencentshill a day ago [ - ]

No harm in listing it for $20k, and if it sells, that's an easy $5-10k for you.

Melatonic 15 hours ago [ - ]

How are you using the 6000 with a Mac ?

iooi a day ago [ - ]

I'll buy your macbook if you're trying to get rid of it!

freediddy a day ago [ - ]

I'm keeping that one for sure, I love it!

daemin 19 hours ago [ - ]

Given that the tokens are being subsidised by a couple orders of magnitude, would it still be as cost effective long term?

wslh a day ago [ - ]

I'm not really asking this from the perspective of whether I should buy hardware. I'm trying to understand the economics.

The AI space is moving so fast that it is hard to know which conclusions are stable. After all the discussion around local models, is the practical conclusion still that API/frontier providers have a huge structural advantage because of datacenter hardware, high utilization, batching, optimized inference stacks, and perhaps strategic pricing?

In a comparison like this, a $25k local setup versus buying tokens, what multiple are we really talking about? 10x? 100x? Or is it too workload-dependent to reduce to a single number?

Has someone written a good breakdown that separates true infrastructure efficiency from temporary underpricing/subsidy? The part I'm trying to understand is less ideological (local vs. cloud) and more basic economics.

freediddy 8 hours ago [ - ]

The speed of results for an API call to ChatGPT is 10-100x faster than my local LLM. I haven't exactly quantified the results but I was getting results in a few seconds vs 10+ minutes for my local LLM. I'm going to do a deep dive this weekend and try to get better results, but it was staggering. I'll also do a deep dive on how to optimize my setup and see if I can get things to perform much quicker.

CamperBob2 a day ago [ - ]

How do you use the RTX 6000 with the Macs? Exo? I would think that would be pretty snappy if configured properly.

freediddy a day ago [ - ]

This is on a separate Windows PC, I don't have it integrated with the Macs.

CamperBob2 a day ago [ - ]

If you don't need cash right away, I'd wait until the M5 Ultra comes out and see how things shape up. There have been some early efforts aimed at combining the prefill performance of a GPU with the high throughput achievable with the Mac's unified memory architecture (see various YouTube videos by Ziskind and others, as well as https://old.reddit.com/r/LocalLLM/comments/1r6drpi/exo_clust... ).

Point being, once the M5 Ultra is available, I suspect a lot of people will get very serious about making Macs work with RTX GPUs because that will yield an inference platform with a good bang:buck ratio. If so, you may find that your existing hardware is more powerful than it seems today. And it may be a lot more expensive to replace later if you sell it now.

Craighead 20 hours ago [ - ]

I'd buy that Mac studio m3

jmyeet a day ago [ - ]

I looked into the M3 Ultra 512GB Mac Studio before it was discontinued and the as best as I could determine it just wasn't worth it... yet. The GFLOPS and memory bandwidth just arne't there even though it can hold a much larger model in memory.

But the trend here is interesting. I think by 2030 you'll be able to buy fairly cheap hardware that is currently $10k+. I don't know what this does to the trillions invested in AI data centers because the next NVidia architecture after Blackwell will essentially half the value of purchased cards overnight.

I'm not convinced Apple has yet pivoted the Mac Studio line towards this market and the expected M5 Ultras in Q3 2026 will likely be an incremental improvement rather than big leap forward but I'd like to be proven wrong.

freediddy a day ago [ - ]

I agree that all these datacenter companies like Coreweave are investing billions in technology that has a very fast depreciation curve and I don't know how they will sustain income. The same goes for datacenters in space, what happens when those chips are obsolete? Will they sent astronauts to replace them or will they let them burn up and send new ones into orbit every year?

I feel that the open weight models pale in comparison to the frontier models, and I believe that if the gap closes quickly, that the open weight vendors will stop releasing it for free.

Atotalnoob a day ago [ - ]

Data centers in space aren’t realistic.

Higher radiation, space insulations, etc.

Underwater data centers provide a lot of the same benefits and can (much more) easily be hauled to the surface

a day ago [ - ]

[deleted]

wotsdat a day ago [ - ]

[dead]