We should have a finish hook that, when the AI decides it's run, runs the hook, and gives it to the LLM, and it can decide whether the problem is still there.

Students don't get to choose whether to take the test, so why do we give AI the choice?