Hacker News

ux266478 14 hours ago [ - ]

AI hardware is for inference, not training. Training uses normal HPC crap. Superpods aren't really power efficient, it's kind of a meme, and it stems from limiting the power draw of other components by having less of them. It's more of a rounding error.

> you'd end up consuming so much excess electricity it would be cheaper on net to simply take the money that would have gone to the power bill and spend it on your own datacenter.

Costs spread over a large population, it really doesn't matter. You're not getting hundreds of thousands of people to pitch half their monthly electric bill to pay for someone else's datacenter. They will pay the electricity themselves quite happily though, if all they need to do is give you compute. This isn't new.

Interconnect is the bottleneck for distributed training, nothing else really.

rurban 10 hours ago [ - ]

You got it wrong. Inference can use crap GPU's. Training needs the 100x more expensive big guns. Our training machine is 100x more expensive than our inference machine.

bombcar 3 hours ago [ - ]

How is the result of training stored? How big is that? It seems reasonable to assume we’ll eventually plateau and all we’ll need is relatively infrequent training.

rurban an hour ago [ - ]

Not so often. The GPU's are running 100% for 3 weeks for a training run. We do images only, but it's the same process. And then we can use the costly GPU's for inference, local model coding agents. Training is about 4x a year. But it depends what ideas the PM or the costumers have. If they has more, more training tasks. Eg. more viruses to detect.

brandensilva 2 hours ago [ - ]

I agree, leave the training to open source federations that roll out like operating systems. Minimal training over time.

Then have inference go down to the next layer to use those models as a P2P decentralized network.

Maybe like open router could tap federation networks.

sho 14 hours ago [ - ]

> AI hardware is for inference, not training

Not sure what you are referring to, unless you don't think h100/h200/b200 are "AI hardware"

> Superpods aren't really power efficient

Maybe not compared to a specialized rig with multiple 4090s, but that is the best case for consumer hardware - the vast majority will be dramatically less efficient than that

Anyway, I agree the interconnect is by far the biggest obstacle and seems insurmountable, I should probably have led with that.

pksebben 14 hours ago [ - ]

Bit of a doozie though, that one.

I recall getting really excited over hinton's FF foray, right before he bailed on AI as a societal direction (which, if anyone ever had the right, I suppose he does). If one squints, one can see a backprop-free base being much easier to train on geographically distributed and heterogenous hardware.

Davidzheng 10 hours ago [ - ]

Are you sure most of frontier cost isn't inference in RL environments?

dyauspitr 14 hours ago [ - ]

That makes no sense. It’s basically the same calculations for training as well.