This is my biggest peeve when people say that LLMs are as capable as humans or that we have achieved AGI or are close or things like that.

But then when I get a subpar result, they always tell me I'm "prompting wrong". LLMs may be very capable of great human level output, but in my experience leave a LOT to be desired in terms of human level understanding of the question or prompt.

I think rating an LLM vs a human or AGI should include it's ability to understand a prompt like a human or like an averagely generally intelligent system should be able to.

Are there any benchmarks on that? Like how well LLMs do with misleading prompts or sparsely quantified prompts compared to one another?

Because if a good prompt is as important as people say, then the model's ability to understand a prompt or perhaps poor prompt could have a massive impact on its output.

It's a type of cognitive bias not much different than an addict or indoctrinated cult follower. A subset of them might actually genuinely fear Roko's basilisk the exact same way colonial religion leveraged the fear of eternal damnation in hell as a reason to be subservient to the church leaders.

hyperstitions from TESCREAL https://www.dair-institute.org/tescreal/