> Given enough data, they can figure out what it means

Right, whereas something that isn't backwards compatible couldn't be figured out no matter how much data is given. Consider this line of Python:

   print(5 / 2)
There is no way you can know what the output should be. That is, unless we introduce synthetic context (i.e. a version number). Absent synthetic context we can reasonably assume that natural context is sufficient, and where natural context is sufficient, backwards compatibility is present.

> What we can readily make use of are mathematical results from long ago.

To some degree, but mostly we've translated the old notation into modern notation for the sake of familiarity. And certainly a lot of programming that gets done is exactly that: Rewriting the exact same functionality in something more familiar.

But like mathematics, while there may have been a lot of churn in the olden days when nothing existed before it and everyone was trying to figure out what works, programming notation has largely settled on what is familiar with some reasonable stability and no doubt will only continue to find even greater stability and it matures in kind.