Hacker News

Fable had mostly the same pre-training data as Opus, and it's likely they're distilled from the same source. The difference is that it's a larger model with more post training on "dangerous" stuff they didn't want in the core model, and "long" task RL.

gordonhart 7 hours ago [ - ]

> it's likely they're distilled from the same source

Any credible references for this? The implication that Anthropic has an even bigger and better model that they haven't released is hard to believe.

CuriouslyC 6 hours ago [ - ]

Lab folks keep cards close to their chests here, but it's likely Mythos was an earlier teacher model for Opus that got additional cybersec post-training. Whether they have a bigger tier than that is hard to say, labs have been cautiously scaling parameters since the failure of GPT4.1. They 100% have a bigger/better model they haven't released, but that's probably more down to it not being done cooking yet. Once it's done, the single larger model lets them drop new Opus and Mythos iterations in rapid succession.

Googlers have hinted that Gemini 3 came in at 10T, which seems hard to operationalize, Google's flash and pro releases are staggered in a way that doesn't make sense if flash is a pro distill, and there are enough cases where Gemini flash outperforms pro on the same task that I think it's unlikely it's just being distilled from an "in progress" version of pro.

gordonhart 4 hours ago [ - ]

Appreciate the long answer. Why is it more likely that Gemini 3 Pro/Flash/Lite are distillations of the same parent model than that they’re different training runs on the same dataset, with minor version bumps being different post-training setups?

CuriouslyC 3 hours ago [ - ]

The biggest tell is the fact that labs are staggering smaller model releases so much with big models. If the small models (flash, sonnet/haiku) were being distilled from pro models, you'd consistently see them be released fairly soon after new pro releases to maximize their competitiveness (and this was the case early on for Anthropic). Instead it seems like releases are timed to build/maintain hype.

A thing to keep in mind is that if they release a smaller model halfway between well spaced big model releases, why wait so long on the next big model release if it's sufficiently ready to distill to a smaller model? The ability to demonstrate AI superiority is worth a ton, there's no reason to hold back.