if it makes thing flaky

then it actually is a huge success

because it found a bug you overlooked in both impl. and tests

at least iff we speak about unit tests

Only if it becomes obvious why it is flaky. If it's just sometimes broken but really hard to reproduce then it just gets piled on to the background level of flakiness and never gets fixed.

Burma-shave