Designing to tolerate the defects is well trodden territory. You just expect some rate of defects and have a way of disabling failing blocks.

So you shoot for 10% more cores and disable failing cores?

More or less, yes. Of course, defects are not evenly distributed, so you get a lot of chips with different grades of brokenness. Normally the more broken chips gets sold off as lower tier products. A six core CPU is probably an eight core with two broken cores.

Though in this case, it seems [1] that Cerebras just has so many small cores they can expect a fairly consistent level of broken cores and route around them

[1]: https://www.cerebras.ai/blog/100x-defect-tolerance-how-cereb...

Well, it's more like they have 900,000 cores on a WSE and disable whatever ones that don't work.

Seriously, that's literally just what they do.