I totally agree! Interacting with LLMs at work for the past 8 months has really shaped how I communicate with them (and people! in a weird way).
The solution I've found for "un-loading" questions is similar to the one that works for people: build out more context where it's missing. Wax about specifically where the feature will sit and how it'll work, force it to enumerate and research specific libraries and put these explorations into distinct documents. Synthesize and analyze those documents. Fill in any still-extant knowledge gaps. Only then make a judgement call.
As human engineers, we all had to do this at some point in our careers (building up context, memory, points of reference and experience) so we can now mostly rely on instinct. The models don't have the same kind of advantage, so you have to help them simulate that growth in a single context window.
Their snap/low-context judgements are really variable, generalizing, and often poor. But their "concretely-informed" (even when that concrete information is obtained by prompting) judgements are actually impressively-solid. Sometimes I'll ask an inversely-loaded question after loading up all the concrete evidence just to pressure-test their reasoning, and it will usually push back and defend the "right" solution, which is pretty impressive!
Yes, great you're sharing this in a bit of detail! I think I've been using a similar approach to getting solid decisions.