Hacker News

I don't mean to be shady, but there are plenty of details that they did release that show that they don't know what they're doing.

They make comparisons to FlashAttention-2 when FlashAttention-4 has been out (even if they wanted to stick to Hopper class GPUs for whatever reason there's still FlashAttention-3). The two orders of magnitude claim look like they're for prefill not next-token decoding, which is a bit duplicitous. Long context extrapolation experiments typically go well beyond 2x context length. Etc etc etc.

I never said they should have a full public disclosure, but I do think sharing something of substance helps build trust and also get people excited.

Lastly, frontier labs have other incentives than to eek out every dollar and cent. Having the most capable models, not the most cost effective, is of significantly higher priority as OpenAI and Anthropic march towards IPOs. The same is not necessarily true for Google/DeepMind, and one can see from their public releases alone for some of their open weight models that this may be more of a priority for them today.