I don’t care how practical it may or may not be, this is my new favorite LLM benchmark