Hacker News

The issue with not having a single source of truth is not the fact that you have to update code in 2-3 places, it’s that you have to know to update code in 2-3 places.

Accidental divergence is the problem, not intentional.

jonahx a day ago [ - ]

Yes, this is true. And is a bigger problem on large teams. One mitigation is a comment by the original author at both sites that there may be a coupling in the future.

But, again, the point is that you don't know yet whether you have a single source of truth or not. It's a question of the relative badness of duplication vs premature abstraction in cases where the code may diverge or converge in the future. There is no generic answer. But as a heuristic, based on my personal experience, I have always found premature abstractions to be more painful to work with. Even more so when someone else has authored them.

Maxion 17 hours ago [ - ]

A lot of the time in my experience this comes down to coders thinking the logic is the same and abstracting something to a central source, when from a business perspective the rules are similar but actually different.

So many times I've had to untangle these types of abstractions when business asks for changes to case X but not Case Y. OR worse, business asks for changes to case X, but it also affects Case Y due to abstractions. Business see X/Y as different things so did not even think to mention that the new suggested behavior is to only affect case X, but to coders they're the same.

ytoawwhra92 a day ago [ - ]

For pure logic I find refactoring to enable divergence much easier than implementing convergence.

usrusr 18 hours ago [ - ]

Not only easier finding call sites than finding copies, also more intuitive to start looking. "Which callers will be affected by the change?" is the most natural question to ask. "Which places should have this same change applied?", not so much.

dspillett a day ago [ - ]

This sometimes falls under “be cautious with what you output, but generous (i.e. flexible) or very careful (full validation, good logging, making sure you fail safe upon receiving any/all unexpected input) with what you accept”. This usually makes duplication the worst choice because you could have to do a lot more thinking (and maybe coding) down the line to make sure all is well everywhere, and you need to document (or at least comment) so that others know these requirements when they make future changes, but it can be a valid approach especially in related but loosely coupled parts.

ketozhang a day ago [ - ]

This assumes the bug exists in both places which might not be true at all even if they both are dependent on the same duplicated code.

If you only spot the bug in path A and not path B, why fix the bug for B?

dspillett a day ago [ - ]

You still need to know to assess B to make sure that it is not affected, and verify that it is not adversely affected if it interacts with the output of A after you have changed it.

fragmede a day ago [ - ]

Why bother having to reason out if path B is or is not buggy? Instead of potentially getting that analysis wrong, DRY, fix it in the one place, be sure that it's fixed for that case, and move onto the next bug.

pjio 14 hours ago [ - ]

You forgot the fourth place.

bluGill a day ago [ - ]

No, the issue is when there are not two or three places, it's when there's hundreds or even thousands of different places. Two or three is annoying, but not a big deal. However, as you get into the hundreds and thousands, it becomes a real problem. In real world code, this is an all too common case.

dilyevsky a day ago [ - ]

Seems like this is a problem almost entirely solved by llm+vector database setup.

throwaway2037 a day ago [ - ]

I don't follow. Will this help to identify duplicate code? FYI: JetBrains' InelliJ already has this feature built-in for years now.

pydry a day ago [ - ]

Sometimes it is genuinely easier to duplicate when that happens - e.g. if three teams maintain an enum with 4 values and there is no existing mechanism for sharing code between the projects.