Hacker News

Unless you literally try every possible combination of inputs (which is usually infeasible), property testing can't give you mathematical guarantees about correctness. You can think of it as a halfway house between classic testing and formal verification:

Classic testing: A human comes up with some concrete example inputs for which they know the "right answers" (corresponding outputs). They write code that runs the code under test, gets its actual outputs, and compares them to the desired outputs.

Property testing: A human comes up with a precise way of randomly generating concrete example (input, desired output) pairs. They write some code to describe how to generate the pairs, often using a declarative DSL that describes only constraints on the inputs and outputs, with the understanding that anything not expressly forbidden is permitted, like "The input can be any list of between 0 and 100 integers each between -500 and 500" and "Every integer in the input must appear the same number of times in the output". They then write some more code (often a single line) to ask the computer to use this "spec" to randomly generate, say, 1000 such pairs, or as many pairs as can be checked in 1s. The computer generates the pairs itself, runs the code under test on each input and and checks its output matches the desired output.

Formal verification: A human comes up with a spec that typically describes conditions that must hold for all (input, output) pairs. This may look very similar to, or even exactly like the DSL used for property testing, though in general there are other conditions that can be expressed that cannot be checked with property testing even in principle -- for example, checking that the program always eventually terminates. The main difference is that the code under test is never actually run; instead, the computer analyses the source code itself to attempt mathematically prove that the stated conditions hold. How to actually accomplish this is a field of active research, but one basic approach is called "symbolic execution". To greatly simplify, if we forget about loops and conditionals for a moment, the idea is that we can write down things we know must be true after each statement executes, based on the things we knew must be true before it executed. So for example if x is a variable initially containing any integer (and we ignore overflow) then after the line

    x = x * x

runs, we know that x >= 0. To handle conditionals like

    if x > 50:
      x = 42

    something_afterwards(x)

the prover "forks" into two cases: One in which we know for certain that x > 50, one in which we know for certain that x <= 50. At the end of the if statement it then has the task of recombining what is known about the two cases. In this example, the first case lets us conclude that x = 42 by the end, while the second case lets us conclude that x <= 50 by the end, so it could conclude that x <= 50 either way by the time execution reaches something_afterwards(x). Handling loops is trickier but generally involves looking for invariants.