> What are you talking about, it had the option for nuanced responses

The prompt allowed for exactly four valid outputs and explicitly disallowed explanations and qualifiers.

> Output exactly one label: True, > Mostly True, Misleading, or False. > No explanations, no qualifiers.

How is that a nuanced response?

> These types of experiments prove to me that there is no real "reasoning" happening and "reasoning/thinking"

My suggestion is that five presumably reasoning and thinking humans would also have variation in their responses to the exact same prompt.