> not an LLM! closer to something like AlphaGo, trained only on ARC, Sudoku and mazes.
ah! this explains the performance..
What is the conventional wisdom on improving codegen in LLMs? Sample n solutions and verify, or run a more expensive tree search?
I have thoughts on a very elaborate add-a-function-verify-and-rollback testing harness and i wonder if this has been tried