Interesting questions. I think I'll attempt #7.

Tried all ten with claude, then had codex take a loook at the work -- codex thinks number 7 has the lowest chance of being correct, a 1 out of 10 rating. None of them were higher than 7/10 chance of being right so far as done by claude opus 4.6 and evaluated by codex 5.3 highest.

Not going to spend too many more tokens on this.

I don't think either of these are the best choices for this. Chatgpt 5.2 pro and gemini 3 pro deep thinking I believe are the strongest LLMs at "pure thought", i.e. things like mathematical reasoning.