I don't think that could be sufficient to solve the problem.
I'm using Gemini with AI Studio and the size of a 1 million token context window is becoming apparent to me. I have a large conversation, multiple paragraphs of text on each side of the conversation, with only 100k tokens or so. Just scrolling through that conversation is a chore where it becomes easier just to ask the LLM what we were talking about earlier rather than try to find it myself.
So if I have several tools, each of them adding 10k+ context to a query, and all of them reasonable tool requests - I still can't verify that it isn't something "you [I] didn't want to get executed" since that is a vague description of the failure states of tools. I'm not going to read the equivalent of a novel for each and every request.
I say this mostly because I think some level of inspectability would be useful for these larger requests. It just becomes impractical at larger and larger context sizes.