I'm not sure what point you're trying to make. In science and engineering, being able to provide justification is a core skill. The comparison we should be making is against the human practitioners who are trained in their fields. There will always be a distribution of ability. Saying that there's evidence that people are capable of providing post-hoc rationalization doesn't say anything about the ability of experts to produce well thought out responses (in their respective fields) that don't immediately fall apart under scrutiny.
Structured thinking and deliberation are indeed important, but you can also make LLMs do structured "thinking" if you work hard enough, and generate quite plausible reasoned arguments with valid real-world results, and you can get them to write down their working as they go. But as research has shown, it's not "true" thinking, just pattern matching at a higher level, and eventually runs out of steam.[0]
But you only have to drill down a couple more layers and you are back in the void again; do you have any proof that your own thinking, no matter how structured and accurate, is anything other than pattern-matching at a sufficiently much higher level at which you are incapable of seeing it as such?
I think we will be finding some very interesting things out soon using the combination of LLMs and theorem provers, as demonstrated by Terence Tao's recent work.[1]
A cheetah is not a motorbike is not an aircraft is not a rocket.
[0] https://arxiv.org/abs/2506.06941
[1] https://arxiv.org/abs/2603.12744