Wow, I wish we could post pictures to HN. That chip is HUGE!!!!
The WSE-3 is the largest AI chip ever built, measuring 46,255 mm² and containing 4 trillion transistors. It delivers 125 petaflops of AI compute through 900,000 AI-optimized cores — 19× more transistors and 28× more compute than the NVIDIA B200.
From https://www.cerebras.ai/chip:
https://cdn.sanity.io/images/e4qjo92p/production/78c94c67be9...
https://cdn.sanity.io/images/e4qjo92p/production/f552d23b565...
Why is the CEO some shady guy? though https://daloopa.com/blog/analyst-pov/cerebras-ipo-red-flags-...
"AI" always has some sleazy person behind it for some reason
*thousands, not thousandths, right?
The correct number is fourty six thousand, two hundred and fifty five square mm.
Thanks, I was acutally wondering how would someone even manage to make that big a chip.
Wow, I'm staggered, thanks for sharing
I was under the impression that often times chip manufacture at the top of the lines failed to be manufactured perfectly to spec and those with say, a core that was a bit under spec or which were missing a core would be down clocked or whatever and sold as the next in line chip.
Is that not a thing anymore? Or would a chip like this maybe be so specialized that you'd use say a generation earners transistor width and thus have more certainty of a successful cast?
Or does a chip this size just naturally ebb around 900,000 cores and that's not always the exact count?
20kwh! Wow! 900,000 cores. 125 teraflops of compute. Very neat
Designing to tolerate the defects is well trodden territory. You just expect some rate of defects and have a way of disabling failing blocks.
So you shoot for 10% more cores and disable failing cores?
More or less, yes. Of course, defects are not evenly distributed, so you get a lot of chips with different grades of brokenness. Normally the more broken chips gets sold off as lower tier products. A six core CPU is probably an eight core with two broken cores.
Though in this case, it seems [1] that Cerebras just has so many small cores they can expect a fairly consistent level of broken cores and route around them
[1]: https://www.cerebras.ai/blog/100x-defect-tolerance-how-cereb...
Well, it's more like they have 900,000 cores on a WSE and disable whatever ones that don't work.
Seriously, that's literally just what they do.
IIRC, a lot of design went into making it so that you can disable parts of this chip selectively.
There have been discussions about this chip here in the past. Maybe not that particular one but previous versions of it. The whole server if I remember correctly eats some 20KWs of power.
A first-gen Oxide Computer rack puts out max 15 kW of power, and they manage to do that with air cooling. The liquid-cooled AI racks being used today for training and inference workloads almost certainly have far higher power output than that.
(Bringing liquid cooling to the racks likely has to be one of the biggest challenges with this whole new HPC/AI datacenter infrastructure, so the fact that an aircooled rack can just sit in mostly any ordinary facility is a non-trivial advantage.)
Well for some. Google has been using liquid cooling to racks for decades.
That’s wild. That’s like running 15 indoor heaters at the same time.
20KW? Wow. That's a lot of power. Is that figure per hour?
What do you mean by "per hour"?
Watt is a measure of power, that is a rate: Joule/second, [energy/time]
> The watt (symbol: W) is the unit of power or radiant flux in the International System of Units (SI), equal to 1 joule per second or 1 kg⋅m2⋅s−3.[1][2][3] It is used to quantify the rate of energy transfer.
https://en.wikipedia.org/wiki/Watt
If you run it for an hour, yes.
I asked because that's the average power consumption of an average household in the US per day. So, if that figure is per hour, that's equivalent to one household worth of power consumption per hour...which is a lot.
Others clarified the kW versus kWh, but to re-visit the comparison to a household:
One household uses about 30 kWh per day.
20 kW * 24 = 480 kWh per day for the server.
So you're looking at one server (if parent's 20kW number is accurate - I see other sources saying even 25kW) consuming 16 households worth of energy.
For comparison, a hair dryer uses around 1.5 kW of energy, which is just below the rating for most US home electrical circuits. This is something like 13 hair dryers going on full blast.
Which honestly doesn't sound that bad given how many users one server is able to serve.
Consumption of a house per day is measured in kiloWatt-hours (an amount of power like litres of water), not kiloWatts (a flow of power like 1 litre per second of water).
1 Watt = 1 Joule per second.
Thanks!
I think you are confusing KW (kilowatt) with KWH (kilowatt hour).
A KW is a unit of power while a KWH is a unit of energy. Power is a measure of energy transferred in an amount of time, which is why you rate an electronic device’s energy usage using power; it consumes energy over time.
In terms of paying for electricity, you care about the total energy consumed, which is why your electric bill is denominated in KWH, which is the amount of energy used if you use one kilowatt of power for one hour.
You're right, I absolutely was mixing them both. Thanks for clarifying!
It’s 20kW for as long as you can afford the power bill
20 kWh per hour
Is all of it one chip? Seems like a waffer with several at least?
Those are scribe lines where you usually would cut out chips which is why it resembles multiple chips. However, they work with TSMC to etch across them.
Maybe I'm silly, but why is this relevant to GPT-5.3-Codex-Spark?
It’s the chip they’re apparently running the model on.
> Codex-Spark runs on Cerebras’ Wafer Scale Engine 3 (opens in a new window)—a purpose-built AI accelerator for high-speed inference giving Codex a latency-first serving tier. We partnered with Cerebras to add this low-latency path to the same production serving stack as the rest of our fleet, so it works seamlessly across Codex and sets us up to support future models.
https://www.cerebras.ai/chip
That's what it's running on. It's optimized for very high throughput using Cerebras' hardware which is uniquely capable of running LLMs at very, very high speeds.
Is this actually beneficial than, say having a bunch of smaller ones communicating on a bus? Apart from space constraints that is.
It's a single wafer, not a single compute core. A familiar equivalent might be putting 192 cores in a single Epyc CPU (or, more to be more technically accurate, the group of cores in a single CCD) rather than trying to interconnect 192 separate single core CPUs externally with each other.
Yes, bandwidth within a chip is much higher than on a bus.
>Wow, I wish we could post pictures to HN. That chip is HUGE!!!!
Using a waffer sized chip doesn't sound great from a cost perspective when compared to using many smaller chips for inference. Yield will be much lower and prices higher.
Nevertheless, the actual price might not be very high if Cerebras doesn't apply an Nvidia level tax.
As I understand it the chip consists of a huge number of processing units, with a mesh network between them so to speak, and they can tolerate disabling a number of units by routing around them.
Speed will suffer, but it's not like a stuck pixel on an 8k display rendering the whole panel useless (to consumers).
> Yield will be much lower and prices higher.
That's an intentional trade-off in the name of latency. We're going to see a further bifurcation in inference use-cases in the next 12 months. I'm expecting this distinction to become prominent:
(A) Massively parallel (optimize for token/$)
(B) Serial low latency (optimize for token/s).
Users will switch between A and B depending on need.
Examples of (A):
- "Search this 1M line codebase for DRY violations subject to $spec."
An example of (B):
- "Diagnose this one specific bug."
- "Apply this diff".
(B) is used in funnels to unblock (A). (A) is optimized for cost and bandwidth, (B) is optimized for latency.
Cerebras addresses this in a blog post: https://www.cerebras.ai/blog/100x-defect-tolerance-how-cereb...
Basically they use very small cores compared to competitors, so faults only affect small areas.
Wooshka.
I hope they've got good heat sinks... and I hope they've plugged into renewable energy feeds...
Fresh water and gas turbines, I'm afraid...
Nope! It's gas turbines
[flagged]