Genuine question, how did you get to these 11 LLMs instead of 10 or 12? I'm interested in understanding how you did benchmark these 11 LLMs or whether it was an arbitrary ensemble you selected.
Genuine question, how did you get to these 11 LLMs instead of 10 or 12? I'm interested in understanding how you did benchmark these 11 LLMs or whether it was an arbitrary ensemble you selected.