> I find this very surprising, and you probably should too.

Why? My first assumption would be that Claude Code would be exceptionally good at writing Lean proofs, for many of the reasons given in the article. It's been on my todo list for a while to try out some of my coding workflows on proof writing. I'm glad OP did.

But why does he find this surprising? That wasn't laid out.

EDIT: To be clear, I'm talking about this bit:

> I think this is part of why Claude Code is surprisingly good at theorem proving. Lean is very strict in the programs it will accept. This makes writing Lean arduous for humans, but it means an AI agent gets detailed feedback.

I first encountered this in writing Rust programs, first with Cursor (using anthropic models), and now with Claude Code. Things have gotten better, but initially Rust was not very well represented in the training sets used, and the agents would be horrendous beginner mistakes. Writing Javascript-like code with Rust-ish syntax, and expecting it to work.

But the rustc compiler gives very good errors, with context and suggestions for fixing. In two or three iterations it would fix all the mistakes, without any need for me to get involved. And because Rust's type system and safety guarantees are so strict, by the time it gets it to compile it _probably_ works.

It's been my assumption since that experience (my first experience using a coding agent), that it would be very good at writing machine-checked proofs.