Hacker News

Okay thanks I'll try that.

> have run into Claude modifying problem statements, adding axioms, etc.

Same here. I've thought about creating a utility that tells Claude it has to keep going until a test exits with nonzero status. But I'm concerned Claude would just fake everything to make the test pass.