Your “some methodological deficits” is doing a lot of work.
What if the methodological deficits are actually causing the paper to underestimate the quality of the AI responses? Why assume any deficits would bias the AI's competence upwards instead of downwards?
What if the methodological deficits are actually causing the paper to underestimate the quality of the AI responses? Why assume any deficits would bias the AI's competence upwards instead of downwards?