Don't the models typically train on their input too? I.e. submitting the question also carries a risk/chance of it getting picked up?
I guess they get such a large input of queries that they can only realistically check and therefore use a small fraction? Though maybe they've come up with some clever trick to make use of it anyway?
OpenAI and Anthropic don't train on your questions if you have pressed the opt-out button and are using their UI. LMArena is a different matter.
they probably dont train on inputs from testing grounds.
you dont train on your test data because you need to have that to compare if training is improving or not.
Given they asked in on LMArena, yes.
Yeah, probably asking on LMArena makes this an invalid benchmark going forward, especially since I think Google is particular active in testing models on LMArena (as evidenced by the fact that I got their preview for this question).
I'll need to find a new one, or actually put together a set of questions to use instead of just a single benchmark.