It measures ability to complete (with a given success rate) a task with a known human benchmark time to complete. I.e., they set the task to human volunteers and timed how long they took the complete that task.
It measures ability to complete (with a given success rate) a task with a known human benchmark time to complete. I.e., they set the task to human volunteers and timed how long they took the complete that task.