> will measure the ability to query LLMs.

Hell, most places are requiring their developers to query LLMs now anyway.