>Consciousness tries to make things up, it learns that people notice this, it then begins trying to construct justifications that won't be predictably called out as false.
There's a logical "skip" between that and
>Eventually it learns how its unconscious operates, and how to interrogate it, and its post-hoc justifications, at least in the common cases, become reliable.
The brain constructs a narrative that won't be called out as false, one that provides social capital, makes one feel good about oneself, is consistent with all your other justifications, etc. It's only an assumption that this process would naturally converge on Truth, and considering it's massively-multiplayer chaos where brains coordinate their stories in complex ways, my assumption is that this would converge on *stability*, not truth.
Yep. It converges on truth unless there's a strong reward for lies because truth is easy. It's a neural network. It just reads off/probes the internal state because that's the cheapest way to model the unconscious. The justification won't necessarily be true, mind, in terms of the labels it puts, but it should mostly be true structurally- behaviorally predictive in the ordinary domain.
(Even if you are incentivized to lie and flatter yourself, it is still helpful to have access to the true signal internally, because that way you can know how to structure your lie to best avoid detection.)