Conveniently, you can use published results as tests of equivalence, provide the ugly code as context, and regenerate it to your liking. I think the odds of such a regeneration introducing a bug that's within the usage domain but that dodges the golden tests are quite low... so long as you resist the urge to add features along the way.