Model output reflects on your input, and the effect is self reinforcing over the course of a whole conversation. Color you add around a problem influences the model behavior.
A "dumber"/vague framing will get a less insightful solution, or possibly no solution at all.
I don't even necessarily think this is a critical flaw - in general it's just the model tuning it's responses to your style of prompt. People utilize LLMs for all kinds of different tasks, and the "modes of thought" for responding to an Erdos problem versus software engineering versus a more human/soft skills topic are all very different. I think the "prompt sensitivity" issue is just coming bundled along with this general behavior.
Keeping a pristine context is so important that I used two separate conversations whenever doing something meaningful. One is the main task executor, and the other is for me to bounce random problems, thoughts, and ideas off of while doing everything to keep a pristine context in the executor instance.
It's sort of an agentic loop where I am one of the agents