My apologies, I thought it would be implicit that I am using the top-tier model of the time given the challenge of the tasks. GPT-5.5 was too new in this top comment (although I did test it a bit in a comment below), so I was using GPT-5.4. Gemini is Pro 3.1 Preview.