Hacker News

You don't understand why the thing their entire company is valued upon is...not being given away freely? They literally are taking an open source model and then adapting it with this technique. If they disclose it, the frontier labs will immediately copy it and outperform them.

My guess is that they're angling for an acquisition.

GenerWork 4 hours ago [ - ]

>My guess is that they're angling for an acquisition.

This is what I've thought was going to happen ever since they publicized their efforts. They probably don't have the money to train large models themselves, might as well get a nice chunk of change by being acquired by someone who already has said large models running.

giancarlostoro 4 hours ago [ - ]

They probably don't have the money to run the model at reasonable scale.

cmogni1 3 hours ago [ - ]

Ahh cf my comment above. The cost of failure at scale is too high for a major to just take a new architecture/mechanism and implement it, especially because a) most claims papers make aren't rigorously tested and b) plenty of things that work at one scale do not work at the scale on which the labs operate. If they want to get acquired, then they should show that they know what they're doing. Otherwise, it looks sketchy.

supern0va 2 hours ago [ - ]

>The cost of failure at scale is too high for a major to just take a new architecture/mechanism and implement it,

Is it, though? This scrappy startup was able to take a large(-ish) open weights model and adapt it. Why can't the frontier labs do the same cost effectively?

>If they want to get acquired, then they should show that they know what they're doing.

I'm sure they would do so under an appropriate NDA as part of negotiations. I'm not sure why you think a full public disclosure is necessary.

cmogni1 35 minutes ago [ - ]

I don't mean to be shady, but there are plenty of details that they did release that show that they don't know what they're doing.

They make comparisons to FlashAttention-2 when FlashAttention-4 has been out (even if they wanted to stick to Hopper class GPUs for whatever reason there's still FlashAttention-3). The two orders of magnitude claim look like they're for prefill not next-token decoding, which is a bit duplicitous. Long context extrapolation experiments typically go well beyond 2x context length. Etc etc etc.

I never said they should have a full public disclosure, but I do think sharing something of substance helps build trust and also get people excited.

Lastly, frontier labs have other incentives than to eek out every dollar and cent. Having the most capable models, not the most cost effective, is of significantly higher priority as OpenAI and Anthropic march towards IPOs. The same is not necessarily true for Google/DeepMind, and one can see from their public releases alone for some of their open weight models that this may be more of a priority for them today.