Hacker News

pjs_ 14 hours ago [ - ]

Continue to believe that Cerebras is one of the most underrated companies of our time. It's a dinner-plate sized chip. It actually works. It's actually much faster than anything else for real workloads. Amazing

h14h 9 minutes ago [ - ]

I'm fascinated by how the economy is catching up to demand for inference. The vast majority of today's capacity comes from silicon that merely happens to be good at inference, and it's clear that there's a lot of room for innovation when you design silicon for inference from the ground up.

With CapEx going crazy, I wonder where costs will stabilize and what OpEx will look like once these initial investments are paid back (or go bust). The common consensus seems to be that there will be a rug pull and frontier model inference costs will spike, but I'm not entirely convinced.

I suspect it largely comes down to how much more efficient custom silicon is compared to GPUs, as well as how accurately the supply chain is able to predict future demand relative to future efficiency gains. To me, it is not at all obvious what will happen. I don't see any reason why a rug pull is any more or less likely than today's supply chain over-estimating tomorrow's capacity needs, and creating a hardware (and maybe energy) surplus in 5-10 years.

onlyrealcuzzo 13 hours ago [ - ]

Nvidia seems cooked.

Google is crushing them on inference. By TPUv9, they could be 4x more energy efficient and cheaper overall (even if Nvidia cuts their margins from 75% to 40%).

Cerebras will be substantially better for agentic workflows in terms of speed.

And if you don't care as much about speed and only cost and energy, Google will still crush Nvidia.

And Nvidia won't be cheaper for training new models either. The vast majority of chips will be used for inference by 2028 instead of training anyway.

Nvidia has no manufacturing reliability story. Anyone can buy TSMC's output.

Power is the bottleneck in the US (and everywhere besides China). By TPUv9 - Google is projected to be 4x more energy efficient. It's a no-brainer who you're going with starting with TPUv8 when Google lets you run on-prem.

These are GW scale data centers. You can't just build 4 large-scale nuclear power plants in a year in the US (or anywhere, even China). You can't just build 4 GW solar farms in a year in the US to power your less efficient data center. Maybe you could in China (if the economics were on your side, but they aren't). You sure as hell can't do it anywhere else (maybe India).

What am I missing? I don't understand how Nvidia could've been so far ahead and just let every part of the market slip away.

sailingparrot 13 hours ago [ - ]

> let every part of the market slip away.

Which part of the market has slept away, exactly ? Everything you wrote is supposition and extrapolation. Nvidia has a chokehold on the entire market. All other players still exist in the small pockets that Nvidia doesn’t have enough production capacity to serve. And their dev ecosystem is still so far ahead of anyone else. Which providers gets chosen to equip a 100k chips data center goes so far beyond the raw chip power.

mgambati 6 hours ago [ - ]

If code is getting cheaper, making cuda alternatives and tooling should not be very far. I can’t see nvidia holding the position for much longer.

onlyrealcuzzo 12 hours ago [ - ]

> Nvidia has a chokehold on the entire market.

You're obviously not looking at expected forward orders for 2026 and 2027.

louiereederson 11 hours ago [ - ]

I think most estimates have Nvidia at more or less stable share of CoWoS capacity (around 60%), which is ~doubling in '26.

mnicky 13 hours ago [ - ]

> What am I missing?

Largest production capacity maybe?

Also, market demand will be so high that every player's chips will be sold out.

onlyrealcuzzo 12 hours ago [ - ]

> Largest production capacity maybe?

Anyone can buy TSMC's output...

CamperBob2 10 hours ago [ - ]

Which I'm sure is 100% reserved through at least 2030.

DeathArrow 2 hours ago [ - ]

Aren't they building new fabs, though? Or even those are already booked?

Keyframe 12 hours ago [ - ]

Can anyone buy TSMC though?

louiereederson 10 hours ago [ - ]

No. TSMC will not take the risk on allocating capacity to just anyone given the opportunity cost.

roughly 8 hours ago [ - ]

Not without an army

wing-_-nuts 12 hours ago [ - ]

Man I hope someone drinks Nvidia's milk shake. They need to get humbled back to the point where they're desperate to sell gpus to consumers again.

Only major road block is cuda...

DeathArrow 2 hours ago [ - ]

What puzzles me is that AMD can't secure any meaningful size of AI market. They missed this train badly.

icelancer 9 hours ago [ - ]

> What am I missing?

VRAM capacity given the Cerebras/Groq architecture compared to Nvidia.

In parallel, RAM contracts that Nvidia has negotiated well into the future that other manufacturers have been unable to secure.

whism 13 hours ago [ - ]

I believe they licensed smth from groq

Handy-Man 12 hours ago [ - ]

Well they `acquired` groq for a reason.

zozbot234 13 hours ago [ - ]

It's "dinner-plate sized" because it's just a full silicon wafer. It's nice to see that wafer-scale integration is now being used for real work but it's been researched for decades.

tiffanyh 8 hours ago [ - ]

If history has taught us anything, “engineered systems” (like mainframes & hyper converged infrastructure) emerge at the start of a new computing paradigm … but long-term, commodity compute wins the game.

pjs_ 3 hours ago [ - ]

I think that was true when you could rely on good old Moore’s law to make the heavy iron quickly obsolete but I also think those days are coming to an end

mzl an hour ago [ - ]

Technically, Cerebras solution is really cool. However, I am skeptical that it will be economically useful for models that are larger in size, as the requirements on the number of racks scales with the the size of the model to fit the weights in SRAM.

arcanemachiner 14 hours ago [ - ]

Just wish they weren't so insanely expensive...

azinman2 13 hours ago [ - ]

The bigger the chip, the worse the yield.

thunderbird120 8 hours ago [ - ]

Cerebras has effectively 100% yield on these chips. They have an internal structure made by just repeating the same small modular units over and over again. This means they can just fuse off the broken bits without affecting overall function. It's not like it is with a CPU.

speedgoose 13 hours ago [ - ]

I suggest to read their website, they explain pretty well how they manage good yield. Though I’m not an expert in this field. I does make sense and I would be surprised if they were caught lying.

moralestapia 13 hours ago [ - ]

This comment doesn't make sense.

Sohcahtoa82 13 hours ago [ - ]

One wafer will turn into multiple chips.

Defects are best measured on a per-wafer basis, not per-chip. So if if your chips are huge and you can only put 4 chips on a wafer, 1 defect can cut your yield by 25%. If they're smaller and you fit 100 chips on a wafer, then 1 defect on the wafer is only cutting yield by 1%. Of course, there's more to this when you start reading about "binning", fusing off cores, etc.

There's plenty of information out there about how CPU manufacturing works, why defects happen, and how they're handled. Suffice to say, the comment makes perfect sense.

snovv_crash 12 hours ago [ - ]

That's why you typically fuse off defective sub-units and just have a slightly slower chip. GPU and CPU manufacturers have done this for at least 15 years now, that I'm aware of.

azinman2 13 hours ago [ - ]

Sure it does. If it’s many small dies on a wafer, then imperfections don’t ruin the entire batch; you just bin those components. If the entire wafer is a single die, you have much less tolerance for errors.

dekhn 13 hours ago [ - ]

Although, IIUC, Cerebras expects some amount of imperfection and can adjust the hardware (or maybe the software) to avoid those components after they're detected. https://www.cerebras.ai/blog/100x-defect-tolerance-how-cereb...

pertymcpert 13 hours ago [ - ]

You can just do dynamic binning.

DocJade 13 hours ago [ - ]

Bigger chip = more surface area = higher chance for somewhere in the chip to have a manufacturing defect

Yields on silicon are great, but not perfect

moralestapia 12 hours ago [ - ]

Does that mean smaller chips are made from smaller wafers?

wat10000 8 hours ago [ - ]

They can be made from large wafers. A defect typically breaks whatever chip it's on, so one defect on a large wafer filled with many small chips will still just break one chip of the many on the wafer. If your chips are bigger, one defect still takes out a chip, but now you've lost more of the wafer area because the chip is bigger. So you get a super-linear scaling of loss from defects as the chips get bigger.

With careful design, you can tolerate some defects. A multi-core CPU might have the ability to disable a core that's affected by a defect, and then it can be sold as a different SKU with a lower core count. Cerebras uses an extreme version of this, where the wafer is divided up into about a million cores, and a routing system that can bypass defective cores.

They have a nice article about it here: https://www.cerebras.ai/blog/100x-defect-tolerance-how-cereb...

Sohcahtoa82 10 hours ago [ - ]

Nope. They use the same size wafers and then just put more chips on a wafer.

moralestapia 10 hours ago [ - ]

So, does a wafer with a huge chip has more defects per area than a wafer with 100s of small chips?

dgfl 9 hours ago [ - ]

There’s an expected amount of defects per wafer. If a chip has a defect, then it is lost (simplification). A wafer with 100 chips may lose 10 to defects, giving a yield of 90%. The same wafer but with 1000 smaller chips would still have lost only 10 of them, giving 99% yield.

moralestapia 3 hours ago [ - ]

As another comment referenced in this thread states, Cerebras seems to have solved by making their big chip a lot of much smaller cores that can be disposed of if they have errors.

dgfl an hour ago [ - ]

Indeed, the original comment you replied to actually made no sense in this case. But there seemed to be some confusion in the thread, so I tried to clear that up. I hope I’ll get to talk with one of the cerebras engineers one day, that chip is really one of a kind.

louiereederson 10 hours ago [ - ]

You say this with such confidence and then ask if smaller chips require smaller wafers.

latchkey 13 hours ago [ - ]

Not for what they are using it for. It is $1m+/chip and they can fit 1 of them in a rack. Rack space in DC's is a premium asset. The density isn't there. AI models need tons of memory (this product annoucement is case in point) and they don't have it, nor do they have a way to get it since they are last in line at the fabs.

Their only chance is an aquihire, but nvidia just spent $20b on groq instead. Dead man walking.

p1esk 13 hours ago [ - ]

The real question is what’s their perf/dollar vs nvidia?

energy123 27 minutes ago [ - ]

That's coupling two different usecases.

Many coding usecases care about tokens/second, not tokens/dollar.

zozbot234 13 hours ago [ - ]

I guess it depends what you mean by "perf". If you optimize everything for the absolutely lowest latency given your power budget, your throughput is going to suck - and vice versa. Throughput is ultimately what matters when everything about AI is so clearly power-constrained, latency is a distraction. So TPU-like custom chips are likely the better choice.

p1esk 13 hours ago [ - ]

By perf I mean how much does it cost to serve 1T model to 1M users at 50 tokens/sec.

zozbot234 12 hours ago [ - ]

All 1T models are not equal. E.g. how many active parameters? what's the native quantization? how long is the max context? Also, it's quite likely that some smaller models in common use are even sub-1T. If your model is light enough, the lower throughput doesn't necessarily hurt you all that much and you can enjoy the lightning-fast speed.

p1esk 12 hours ago [ - ]

Just pick some reasonable values. Also, keep in mind that this hardware must still be useful 3 years from now. What’s going to happen to cerebras in 3 years? What about nvidia? Which one is a safer bet?

On the other hand, competition is good - nvidia can’t have the whole pie forever.

zozbot234 12 hours ago [ - ]

> Just pick some reasonable values.

And that's the point - what's "reasonable" depends on the hardware and is far from fixed. Some users here are saying that this model is "blazing fast" but a bit weaker than expected, and one might've guessed as much.

> On the other hand, competition is good - nvidia can’t have the whole pie forever.

Sure, but arguably the closest thing to competition for nVidia is TPUs and future custom ASICs that will likely save a lot on energy used per model inference, while not focusing all that much on being super fast.

latchkey 11 hours ago [ - ]

AMD

wiredpancake 11 hours ago [ - ]

[dead]

fragmede 12 hours ago [ - ]

> Throughput is ultimately what matters

I disagree. Yes it does matter, but because the popular interface is via chat, streaming the results of inference feels better to the squishy messy gross human operating the chat, even if it ends up taking longer. You can give all the benchmark results you want, humans aren't robots. They aren't data driven, they have feelings, and they're going to go with what feels better. That isn't true for all uses, but time to first byte is ridiculously important for human-computer interaction.

zozbot234 12 hours ago [ - ]

You just have to change the "popular interface" to something else. Chat is OK for trivia or genuinely time-sensitive questions, everything else goes through via email or some sort of webmail-like interface where requests are submitted and replies come back asynchronously. (This is already how batch APIs work, but they only offer a 50% discount compared to interactive, which is not enough to really make a good case for them - especially not for agentic workloads.)

xnx 13 hours ago [ - ]

Or Google TPUs.

latchkey 13 hours ago [ - ]

TPUs don't have enough memory either, but they have really great interconnects, so they can build a nice high density cluster.

Compare the photos of a Cerebras deployment to a TPU deployment.

https://www.nextplatform.com/wp-content/uploads/2023/07/cere...

https://assets.bwbx.io/images/users/iqjWHBFdfxIU/iOLs2FEQxQv...

The difference is striking.

p1esk 13 hours ago [ - ]

Oh wow the cabling in the first link is really sloppy!

latchkey 13 hours ago [ - ]

Exactly. They won't ever tell you. It is never published.

Let's not forget that the CEO is an SEC felon who got caught trying to pull a fast one.

boredatoms 9 hours ago [ - ]

Power/cooling is the premium.

Can always build a bigger hall

latchkey 9 hours ago [ - ]

Exactly my point. Their architecture requires someone to invest the capex / opex to also build another hall.

spwa4 13 hours ago [ - ]

Oh don't worry. Ever since the power issue started developing rack space is no longer at a premium. Or at least, it's no longer the limiting factor. Power is.

latchkey 13 hours ago [ - ]

The dirty secret is that there is plenty of power. But, it isn't all in one place and it is often stranded in DC's that can't do the density needed for AI compute.

Training models needs everything in one DC, inference doesn't.

dalemhurley 12 hours ago [ - ]

Yet investors keep backing NVIDIA.

vimda 11 hours ago [ - ]

At this point Tech investment and analysis is so divorced from any kind of reality that it's more akin to lemmings on the cliff than careful analysis of fundamentals

femiagbabiaka 13 hours ago [ - ]

yep

xnx 13 hours ago [ - ]

Cerebras is a bit of a stunt like "datacenters in spaaaaace".

Terrible yield: one defect can ruin a whole wafer instead of just a chip region. Poor perf./cost (see above). Difficult to program. Little space for RAM.

the_duke 13 hours ago [ - ]

They claim the opposite, though, saying the chip is designed to tolerate many defects and work around them.

13 hours ago [ - ]

[deleted]

13 hours ago [ - ]

[deleted]