> all LLMs that I know of rely on entropy and randomness to emulate human creativity
Those are tuneable parameters. Turn down the temperature and top_p if you don't want the creativity.
> Claude is not going to improve this.
We can measure models vs humans and figure this out.
To your own point, humans already make "rampant" mistakes. With models, we can scale inference time compute to catch and eliminate mistakes, for example: run 6x independent validators using different methodologies.
One-shot financial models are a bad idea, but properly designed systems can probably match or beat humans pretty quickly.
> Turn down the temperature and top_p if you don't want the creativity.
This also reduces accuracy in real terms. The randomness is used to jump out of local minima.
That's at training time, not inference time. And temp/top_p aren't used to escape local minima, methods like SDG batch sampling, Adam, dropout, LR decay, and other techniques do that.
Ahh okay, so you really can't escape the indeterminacy?
You can zero out temperature and get determinism at inference time. Which is separate from training time where you need forms of randomness to learn.
The point is for the quote "all LLMs that I know of rely on entropy and randomness to emulate human creativity" is a runtime parameter you can tweak down to zero, not a fundamental property of the technology.
Right, but my point is is that even if you turn the temperature all the way down, you're not guaranteed to get an accurate or truthful result even though you may get a mostly repeatable deterministic result, and there is still some indeterminacy.
> Those are tuneable parameters. Turn down the temperature and top_p if you don't want the creativity.
Ah yes, we'll tell Mary from the Payroll she could just tune them parameters if there is more than "like 2%" error in her spreadsheets
No one said it was a user setting. The person building the spreadsheet agent system would tune the hyper-parameters with a series of eval sets.
Technically it’s deterministic. It just might not be correct :)