Consider a function that gets an array of integers and a positive number, and returns the sum of the array elements modulo the number. How can we prove using tests, that this always works for all possible values?

Not discounting the value of tests: we throw a bunch of general and corner cases at the function, and they will ring the alarm if in the future any change to the function breaks any of those.

But they don't prove it's correct for all possible inputs.

I would lean towards types and property testing here using tools like Coq.