Unless I'm missing something, this argument seems to apply only to the original pretraining era (eg GPT 1-4). The post-training and reinforcement learning paradigms are clearly doing variation, evaluation and selective retention no?

The transcript does seem to overlook post-training steps like Reinforcement Learning with Verifiable Rewards (RLVR) (but I'll certainly won't claim that Rich Sutton is unaware of such things; RLVR has a very narrow set of evaluation approaches).

I wonder if this is a precursor to Keen Tech leaning into David Silver's Ineffable Intelligence approach.

This was exactly what I was thinking of. RLVR is the secret sauce behind o3 and its many successors.

Its the secret sauce behind why the current models are so great at coding and soon to be unbeatable at math.

LLMs can pose many questions and if they are easily verifiable, fine tune very heavily. A lot of the world models discussion will inevitable lean into simulations as verification.

I'll admit that I miss having access to the ChatGPT 4.5 "absolutely gigantic model" with enough tuning to make it sane and useful. The RLVR models are superb for actual tasks in those RLVR domains, but that fine tuned view of the world as a verifiable problem to solve makes them feel worse for touchy feely stuff. Even for medical consultation and diagnosis, RLVR model's urge to reach a conclusion often is a liability.

Fable 5/Mythos 5 is the next "big chungus LLM".

It's RLVR tuned, but not to the ChatGPT level of brain damage, and it's still backed by a fuck off huge pool of model weights - which matters for what you call "touchy feely stuff".