there's more details under the Too narrow and too wide tests heading.
It would be interesting to see a deeper investigation, into how the models are dealing with this and whether the successful ones seemed to be trained on the benchmark.
there's more details under the Too narrow and too wide tests heading.
It would be interesting to see a deeper investigation, into how the models are dealing with this and whether the successful ones seemed to be trained on the benchmark.