I don't really see there being much of a difference between the modeling surprises and last-minute surprises only uncovered when the program runs. If the gist of what you're saying is that an experienced proof engineer should have fewer modeling surprises than an experience software engineer has last-minute code execution surprises, then I can sort of agree. But the "contours" of a proof you're describing are basically analogous to decomposing an application into individual components (modules/functions) --- and if you do this and test your components you should expect your software to not have any "last minute surprises" except for modeling failures as well.
I guess the point I mostly disagree with is the feasibility of what you describe here:
> In particular, if you get definitions right and if your plan actually makes sense mathematically (i.e. the proof is possible using your planned breakdown), you have a much more linear sense of progress.
This presupposes that you know how the proof should be written, which means that you not only have an understanding of the proof mathematically (whatever your definition of "mathematically" is), but you ALSO know how to warp that understanding into a machine proof. The author notes that the "getting the definitions right" part is slower than them writing out the definitions themselves for a variety of reasons which ultimately mean they had to carefully comb through the definitions to make sure there were no logic errors.
So I don't really disagree with your overall point (which is that the author can probably be trusted to say they are actually at 50%), but I think you oversell the assurances granted by formal verification when compared to software engineering.
It’s not necessarily smooth sailing for sure. But in my (limited!) experience, there is a noticeable difference, so that’s what I wanted to note.
There indeed can be a gap between “it works in my head” and convincing the machine to do it, and sometimes the gap is wide. For example, type-theoretic way of looking a things (as customary in Lean) might not match the idioms in the paper proof, or some crucial lemmas known to be true might have never been formalized. Or the tactics to prove some piece are getting too slow. Or you got the definitions wrong or they’re annoying to deal with. So that is its own set of difficulties.
Even with that, the nature of difficulties feels different from much of software work to me. In software, I find that “divide and conquer” rarely works and I need to “grow” the program from a primitive version to a more sophisticated one. But in proofs, “divide and conquer” actually sort of works — you gain some assurances from making a high-level proof with holes in the middle, then filling in those holes with more intermediate steps, and so on. You can actually work “top down” to a meaningful extent more than in software.
To put it another way, with proofs, you can work top-down and bottom-up at the same time, making real progress while a bunch of steps are stubbed out. In programming, “stubbing out” non-trivial parts of the program is often annoying or impossible, and you can only make progress on a smaller (but complete) piece of it, or a less ambitious (but still complete) piece of it. This is because you can’t run an incomplete program, and there’s a set of knowledge you only gain from running it. But you absolutely can check an incomplete proof, with arbitrary granularity of incompleteness.
So I agree with you it’s not necessarily linear. But it’s also not exactly like software, and analogies from software work don’t apply 1:1.