Isn't it possible that o1 was also trained on this data (or something super similar) directly? The score seems disproportionately high.
Isn't it possible that o1 was also trained on this data (or something super similar) directly? The score seems disproportionately high.
They definitely considered it. Early theinformation articles talked about how high the performance of strawberry was on it.