While I am excited to see a new model, I am skeptical when there is so much vagueness - charts with "frontier models" without actually spelling out which ones, charts with no numbers (time axis, or in one chart - entirely).

There is a footnote that should help with the models. Training is a harder thing to report on, but roughly our finding here is that RL scales.