I use the Monty Hall problem to test people in two steps. The second step is, after we discuss it and come up with a framing that they can understand, can they then explain it to a third person. The third person rarely understands, and the process of the explanation reveals how shallow the understanding of the second person is. The shallowest understanding of any similar process that I've usually experienced is an LLM.
I am not sure how good your test really is. Or at least how high your bar is.
Paul Erdös was told about this problem with multiple explanations and just rejected the answer. He could not believe it until they ran a simulation.
In my experience, as Harvard outlined long ago, the two main issues with decision making are frame blindness (don't consider enough other ways of thinking about the issue) and non-rigorous frame choice (jumping to conclusions).
But an even more fundamental cause, as a teacher, is that I often find seemingly different frames to both simply be misunderstood, not understood and rejected. I learned by trying many ways of presenting what I thought the best frame was. So I learned that "explanations" may be received primarily as noise, with "What is actually being said" being replaced with, incorrectly, by "What I think you probably mean". Whenever someone replies "okay" to a yes or no comment/statement, I find they have always misunderstood the statement, and learned how often people will attempt to move forwards without understanding where they are.
And if multiple explanations are just restatings of the same frame (as is common in casual arguments), it's impossible to compare frames, because only one is being presented.. It's the old "if you think aren't making any mistakes, that's another mistake".
Often, a faulty frame clears up both what is wrong with another frame, as well as leading to a best frame. I usually find the most fundamental frame is the most useful.
For example, I found many Reddit forums discussing a problem with selecting the choice of audio output (speaker) on Fire TV Sticks. If you go through the initial setup, sometimes it will give you a choice (first level of flow chart), but often not the next level choice, which you need. And setup will not continue. Then it turned out that old remotes and new remotes had the volume buttons in a different location, and there were two sets of what looked like volume buttons. When you pressed the actual volume buttons, everything worked normally. When you pressed the up/down arrows where the old volume buttons had been, you had to restart setup many times.
The correct framing of the problem was "Volume buttons are now on the left, not the right". It was not a software setup issue. Or wondering why you're key doesn't work, but you're at the wrong car. Or it's not a problem with your starter motor, you're out of gas. Etc.
I don't know who Paul Erdös is, so this isn't useful information without considering why they rejected the answer and what counterarguments were provided. It is an unintuitive problem space to consider when approaching it as a simple probability problem, and not one where revealing new context changes the odds.
Erdös published more papers than any other mathematician in history—and collaborated with more than 500 coauthors, giving rise to the concept of the "Erdős number," a (playful) measure of collaborative proximity among mathematicians