I'm never sure how much faith one can put into such benchmarks but in any case the optics seem to shift once you have pass@2 and pass@3.
Still, the more interesting comparison would be against something such as Codex.
I'm never sure how much faith one can put into such benchmarks but in any case the optics seem to shift once you have pass@2 and pass@3.
Still, the more interesting comparison would be against something such as Codex.