Must be some Mandela effect about some TDD documentation I read a long time ago.

If you test math_add(1,2) and it returns 3, you don't know if the code does `return 3` or `return x+y`.

It seems I might need to revise my view.

I vaguely remember the same advice, it's pretty old. How you use the randomness is test specific, for example in math_add() it'd be something like:

  jitter = random(5)
  assertEqual(3 + jitter, math_add(1, 2 + jitter))
If it was math_multiply(), then adding the jitter would fail - that would have to be multiplied in.

Nowadays I think this would be done with fuzzing/constraint tests, where you define "this relation must hold true" in a more structured way so the framework can choose random values, test more at once, and give better failure messages.

Randomness is useful if you expect your code to do the correct thing with some probability. You test lots of different samples and if they fail more than you expect then you should review the code. You wouldn't test dynamic random samples of add(x, y) because you wouldn't expect it to always return 3, but in this case it wouldn't hurt.