Simon is right about using code execution, but many tables one might look at outside of formal data work are small enough for LLMs to be very reliable at, so this format question is practically relevant. I wish they had tested better models.
Simon is right about using code execution, but many tables one might look at outside of formal data work are small enough for LLMs to be very reliable at, so this format question is practically relevant. I wish they had tested better models.