i wonder how hard it is to get the setup for AI to evolve on?

I spent 2~3 hours setting up, most of the time was spent on writing the evaluator

Actually I think the evaluator will be the most important part for the whole pipeline to work

Yes, getting the right workloads and ensuring correctness are crucial parts of the process