> This suggests that scaling alone won't eliminate incoherence. As more capable models tackle harder problems, variance-dominated failures persist or worsen.

This is a big deal, but are they only looking at auto-regressive models?