I'm screenshotting this, let's see who's right.
Actually, your whole point about LLMs not being able to detect correctness is just demonstrably false if you play around with LLM agents a bit.
I'm screenshotting this, let's see who's right.
Actually, your whole point about LLMs not being able to detect correctness is just demonstrably false if you play around with LLM agents a bit.
A system outputting correct facts, tells you nothing about the system's ability to prove correctness of facts. You can not assert that property of a system by treating it as a black box. If you are able to treat LLMs as a white box and prove correctness about their internal states, you should tell that to some very important people, that is an insight worth a lot of money.
As usual, my argument brought all the people out of the woodwork who have some obsession about an argument that's tangential. Sorry to touch your tangent, bud.
> LLMs not being able to detect correctness is just demonstrably false if you play around with LLM agents a bit.
How is telling you that this method of determining correctness is incapable of doing so, only tangential?
Correctness and proven correctness are different things. I suspect you're a big Rocq Prover fan.