Exactly. But the papers I’ve seen, the tests are done based on answers being multiple choice usually.
Where do you eat?
A) floor
B) table
C) dirt
In this case, the questions asked have an answer. The bias would then be on the order of the input data. It’s different enough that it triggered my curiosity.
https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00638...