Hacker News

DeepSeek inference efficiency comes from two things: MoE and MLA attention. OpenAI was rumored to use MoE around GPT4 moment, I.e loooong time ago.

Given Gemini efficiency with long context I would bet their attention is very efficient too.

GPT OSS uses fp4, which DeepSeek doesn’t use yet btw.

So no, big labs aren’t behind DeepSeek in efficiency. Not by much at least.