The powerful difference is between specific testing and exhaustive proving. If you fail to think of a test for an edge case, whoops. If you have a good model, you can know that it exists—and fix it—before you ship.

This is particularly valuable IMO in concurrent/distributed/multi-threaded situations where race-conditions and deadlocks are extremely hard to test and reason about. "Can A, B, C happen in the order 'A, C, B'" types of things.

I have a rough hypothesis that maturity in this space looks something like:

1) LLMs will allow much faster learning and usage of formal methods, even if initially just for "do it a second time" post-hoc verification

2) From there you move towards automation of LLM-checking "hey, the implementation code changed, does it look like it broke the model" - though this is still not deterministic, but it's likely a lot better than a human review requirement of something that only changed infrequently would've been

3) And then the real jump will be taking tooling for "write just the formal spec, let the implementation get generated" to the next level. There's various mostly-not-fully-baked-for-what-most-companies-would-want projects already out there for codegen like this, what if the LLM tools can accelerate the year-or-two of manual work it would take to make one of them fit your needs?