I put this into Grok and it got the right answer on quick mode. I did not give multiple choice though.
The real solution is to have 4 AI answer and let the human decide. If all 4 say the same thing, easy. If there is disagreement, further analysis is needed.
The issue with "adversarial" questions like the blood pressure one (which is open-sourced and published 1 year ago) is that they are eventually are ingested into model training data.
Shouldn't it be 3 or 5? https://news.ycombinator.com/item?id=46603111
Are two heads better than one? The post explains why an even number doesn't improve decision-making.
Would that still be relevant here?
That was a binary situation and more evidence wasnt helping improve anything.
You could change the standards. If any of the 4 fail, then reject the data.