Maybe not efficient, but if the LLMs can't even reach this benchmark then I'm not sure.