OpenAI also announced two days ago that they're starting to make Cerebras style chips themselves [0], will be interesting to see how fast SotA model inference will be by the end of the year.

[0]: https://openai.com/index/openai-broadcom-jalapeno-inference-...

I don't understand how you refer to this as "Cerebras-style". Cerebras is wafer-scale and unique. Jalapeno is an inference-optimized conventional chip.

Cerebras is different than what jalapeno is.

Jalepeno is for mass scale inference.

Cerebras is extremely expensive and difficult to scale, hence the limited release.

Even if their chip is a difference maker, end of the year is wayy too optimistic. It’ll at minimum be a multi-year effort to bring it to production at scale.

I don't see any indications that OpenAI is doing wafer-scale work.

I tend to doubt they would. Cerebras notably doesn't have a kv, is wildly high bandwidth, but within/across the chip, not able to dump/restore kv super well. I doubt openai is going to build something that is as expensive to run. Also, wafer-scale is absurdly hard & weird to pull off, so I doubt that would be their first foray.