I wish creators of local model inference tools (LM Studio, Ollama etc.) would release these numbers publicly, because you can be sure they are sitting on a large dataset of real-world performance.