So we need to generate benchmarks after the models finish training. Or we need to keep the solutions to the benchmark problems as closed source.
So we need to generate benchmarks after the models finish training. Or we need to keep the solutions to the benchmark problems as closed source.