> the paradox is, the LLMs are only useful† if you're Schwartz, and you can't become Schwartz by using LLMs.

That you can't "become Schwartz" by using LLMs is an unproven assumption. Actually, it's a contradiction in the logic of the essay: if Bob managed to produce a valid output by using an LLM at all, then it means that he must have acquired precisely that supervision ability that the essay claims to be necessary.

Btw, note that in the thought experiment Bob isn't just delegating all the work to the LLM. He makes it summarise articles, extract important knowledge and clarify concepts. This is part of a process of learning, not being a passive consumer.

There's no contradiction, the point is that Bob is able to produce valid output using LLMs, but only while he himself is being supervised; and that he doesn't develop the skills to supervise independently himself in the future.

> only while he himself is being supervised

No, this is impossible unless Bob is presenting at each weekly meeting simply the output of the LLM and feeding the tutor's feedback straight into it. For a total of 10 minutes work per week, and the tutor would notice straight away at least for the lack of progress.

No, the article specifies that Bob actually works with the LLM, doesn't just delegate. He asks the agent to summarise, to explain, and to help with bug fixing. You could easily argue that Bob, having such an AI tutor available 24/7, can develop understanding much faster. He certainly won't waste his time with small details of python syntax (though working with a "coding expert" will make his code much cleaner and more advanced).

This is the rub, Bob would not be promoted if he consistently provided unreliable LLM output. In order to get promoted, Bob needs to learn the skills that get reliable output out of an LLM. These may not be the same skills that Alice learns, but if the argument is that Schwartz's LLM output is valuable -- why are we to assume Bob's path isn't towards Schwartz?

It doesn't contradict the logic of the essay.

There are flowers that look & smell like female wasps well enough to fool male wasps into "mating" with them. But they don't fly off and lay wasp eggs afterwards.

But there is a distinction we can make between flowers and wasps. If there is no distinction we can make between Schwartz and non-Schwartz, then we are susceptible to the sample problem with or without AI. And if there is a distinction then we can use that distinction to test Bob, and make him learn from his test failures.

Sure.

But the whole point is that there is a significant difference between Schwartz and non-Schwartz, that only turns up after they start working for real, producing new work rather than rehashing established material, and it takes years to detect. By that time, Bob's forty.

It isn't a "sample problem" it's a process problem. By perpetually raising the stakes and focusing on metrics (e.g. grades, number of publications for students, graduation rates for schools) we've created and fallen into a Poe's law trap. Adding a new metric isn't likely to help.

What might help? Making the metrics harder to game (e.g. something like oral exams, early and often), more discerning (grade deflation), and moving the wrong-track consequences earlier (start holding people back in grade school, make failing to graduate high school easier, make getting into college harder, etc.), and change the cash-cow funding models to remove the perverse incentives.

We aren't likely to do any of these things.