> It may not be as stupid as it sounds.
It is.
> However, in any probabilistic system, you can run a function k times and you’ll get an output distribution that is meaningful if k is high enough.
This is the underlying flaw in this approach. Attempting to use probabilistic algorithms to produce a singular verifiably correct result requires an external agent to select what is correct in the output of "k times" invocations. This is a person capable of making said determination.
> The thing with LLMs is they scale in a way that actually allows this to be possible, in a way that scaling with humans can’t.
For the "k times" generation of text part, sure. Not for the determination of which one within k, if any, are acceptable for the problem at hand.
EDIT: clarified "produce a verifiably correct result" to be "produce a singular verifiably correct result"