Relative probabilities. That means comparing 2+ alternatives, and we're only talking about the model's worldview, not objective reality. The math for that is relatively straightforward. "Yes" could be 0.9, and ok that means nothing. But If we artificially constraint outputs to "Yes" and "No", and calculate the softmax for Yes to be 0.7 and No to be 0.3, that does lead to a straightforward probability calculation. [Not the naïve calculation you would expect, because of how softmax is computed. But you can derive an equation to convert it into normalized probabilities.]

And now I'm certain we're taking past each other. I'm not talking about calibrated probabilities at all. Just the notion of "how confident do I feel about this?" which is what I interpreted the question above to be about. You can get that out of an LLM, with some work.

> But If we artificially constraint outputs to "Yes" and "No", and calculate the softmax for Yes to be 0.7 and No to be 0.3, that does lead to a straightforward probability calculation. [Not the naïve calculation you would expect, because of how softmax is computed. But you can derive an equation to convert it into normalized probabilities.]

There is nothing straightforward about this, and no, there is no such formula.

> I'm not talking about calibrated probabilities at all. Just the notion of "how confident do I feel about this?"

If all you care about is vibes / feels, sure. If you actually need numerical guarantees and quantitative estimates to make your "feelings" about confidence mean something to rigorously justify decisions, you need calibration. If you aren't talking about calibration in these discussions, you are missing probably the most core technical concept that addresses these issues seriously.

We're talking about artificial intelligence. Making computers think the way people do. People are are notoriously miscalibrated on their own self-assessed probabilities too.

Finding a way to objectively calibrate a sense of "how confident do I feel about this?" would be fantastic. But let's not move goal posts. It would still be incredibly useful to have a machine that can merely matches the equivalent statement of confidence or uncertainty that a human would assign to their mental model, even if badly calibrated.

IMO it is you who are moving the goalposts, most likely in an attempt to hide the fact you were unaware of calibration before this discussion.

> It would still be incredibly useful to have a machine that can merely matches the equivalent statement of confidence or uncertainty that a human would assign to their mental model, even if badly calibrated.

If human feelings are badly calibrated, they are useless here too, so no, I don't agree. Things like "confidence" only matter if they are actually tied to real outcomes in a consistent way, and that means calibration.

Please assume good faith.