There's a famous quote from Dijkstra: "Program testing can be used to show the presence of bugs, but never to show their absence." The flaw of testing is that it can only test the behaviors that you think might be problematic. To actually reach into the category of proactively fixing behaviors that you didn't know could go wrong, you have to reach for more exotic tools in the toolkit. Fuzz testing is a start down this path; formal verification is a different start down this path.
Obviously, the quality of these sorts of tools is dependent on your ability to write a formal model that is comprehensive in allowing behavior you want to be acceptable and disallowing behavior you want to be unacceptable, and we're still surprisingly far from that in many fields.
The problem with that mindset is that testing, like verification, becomes extremely powerful as it becomes more automated and you throw more compute at it. And it's arguably easier to automate testing than it is to automate formal verification.
Sure, testing isn't perfect. But is finding 100% of the bugs that much better than finding (say) 99% of them? This is especially the case if the missed bugs tend to be those that happen very rarely.
A formal specification allows automatic generation of tests. So run billions of tests, randomly generated, and see if any violate the specification.
Even theorem proving systems use this sort of thing as a short cut, for example pruning off attempts to prove a universally quantified statement by looking for counterexamples.