I believe that "single source of truth" is a principle that should always be followed. If there's duplicated code where it'd be a bug if they diverge, then you should refactor. It creates a long-distance coupling in your code that may be invisible to future developers until a bug emerges.

But with that in mind, I mostly agree with the article: if it's not a violation of "single source of truth", then abstractions are just a convenience. If it starts being inconvenient, then it's not doing its job and there's no reason to use it. It's a serious code smell if a function needs several flags for custom behavior; that means it's probably the wrong abstraction or violating the single responsibility principle. If there is a legit need for lots of customization, an often-good way to handle is to take a function/functor as an argument for the customization. E.g., rather than `solve(f:double -> double, max_iters = 99, x_abs_tol = 1e-15, x_rel_tol = 1e-15, ...)` you can do `solve(f:double -> double, stopping_criteria: StoppingCriteriaClass)`

> I believe that "single source of truth" is a principle that should always be followed

Fundamentally, the article addresses cases where it's not clear yet how many sources of truth there will be. Are the two spots in the code using the same algorithm, or slightly different versions? More importantly, will they change for the same sorts of reasons?

The title adage (correctly, imo) argues that making two different things the same will cause you more pain than making two same things different via duplication. In the latter thing case, the "damage" is just having to make the same changes twice, or doing a refactor to introduce the abstraction. In the former case, you have to keep adding to your abstraction, or undo it. Most crucially, it breaks "locality", which is the only property you really care about when making changes. I just want to make this change and not worry about side effects to unrelated parts of the system.

The issue with not having a single source of truth is not the fact that you have to update code in 2-3 places, it’s that you have to know to update code in 2-3 places.

Accidental divergence is the problem, not intentional.

Yes, this is true. And is a bigger problem on large teams. One mitigation is a comment by the original author at both sites that there may be a coupling in the future.

But, again, the point is that you don't know yet whether you have a single source of truth or not. It's a question of the relative badness of duplication vs premature abstraction in cases where the code may diverge or converge in the future. There is no generic answer. But as a heuristic, based on my personal experience, I have always found premature abstractions to be more painful to work with. Even more so when someone else has authored them.

A lot of the time in my experience this comes down to coders thinking the logic is the same and abstracting something to a central source, when from a business perspective the rules are similar but actually different.

So many times I've had to untangle these types of abstractions when business asks for changes to case X but not Case Y. OR worse, business asks for changes to case X, but it also affects Case Y due to abstractions. Business see X/Y as different things so did not even think to mention that the new suggested behavior is to only affect case X, but to coders they're the same.

For pure logic I find refactoring to enable divergence much easier than implementing convergence.

Not only easier finding call sites than finding copies, also more intuitive to start looking. "Which callers will be affected by the change?" is the most natural question to ask. "Which places should have this same change applied?", not so much.

This sometimes falls under “be cautious with what you output, but generous (i.e. flexible) or very careful (full validation, good logging, making sure you fail safe upon receiving any/all unexpected input) with what you accept”. This usually makes duplication the worst choice because you could have to do a lot more thinking (and maybe coding) down the line to make sure all is well everywhere, and you need to document (or at least comment) so that others know these requirements when they make future changes, but it can be a valid approach especially in related but loosely coupled parts.

This assumes the bug exists in both places which might not be true at all even if they both are dependent on the same duplicated code.

If you only spot the bug in path A and not path B, why fix the bug for B?

You still need to know to assess B to make sure that it is not affected, and verify that it is not adversely affected if it interacts with the output of A after you have changed it.

Why bother having to reason out if path B is or is not buggy? Instead of potentially getting that analysis wrong, DRY, fix it in the one place, be sure that it's fixed for that case, and move onto the next bug.

You forgot the fourth place.

No, the issue is when there are not two or three places, it's when there's hundreds or even thousands of different places. Two or three is annoying, but not a big deal. However, as you get into the hundreds and thousands, it becomes a real problem. In real world code, this is an all too common case.

Seems like this is a problem almost entirely solved by llm+vector database setup.

I don't follow. Will this help to identify duplicate code? FYI: JetBrains' InelliJ already has this feature built-in for years now.

Sometimes it is genuinely easier to duplicate when that happens - e.g. if three teams maintain an enum with 4 values and there is no existing mechanism for sharing code between the projects.

One killer life hack I’ve found is, if extreme duress pushes software into two sources of truth, add a ci test that wont merge into main till the sources match. The canonical case of this actually being the best solution is pyproject.toml / requirements.txt synchronization, but I suspect it has broader applicability. A precondition is that things have already gone off the rails far enough that single source of truth is unattainable, this is more harm reduction than cure

I know it is just an example but I'd generate one of those files from the other in that case.

> I believe that "single source of truth" is a principle that should always be followed

Theoretically and conceptually I agree. But in practice there are a lot of programming languages aren’t as expressive. People prefer codebases with duplications rather than visitor patterns everywhere. In essence, visitor pattern is a tool to solve multi-dimensional abstraction problems, just like type classes in Haskell or CLOS in Common Lisp. But it’s so verbose and non-straightforward so more often than not it’s not worth it even conceptually it’s a legit case for “single source of truth”.

Visitor pattern is there due to a very simply reason. You have n datatypes with m functions. FP languages makes adding a new row to this nm table easy, OOP languages makes adding a new column easy (that is, without changing every* use site as well).

Visitor pattern makes the row addition case possible for OOP languages, that's it.

> it'd be a bug if they diverge

That's a very nice rule of thumb. I've often overabstracted when two pieces of code look similar at one point in time and then they diverge.

Of course, in theory this is true. In practice people tend to avoid ANY duplication no matter what. Especially junior developers, as if duplication would be the root of all evil.

> as if duplication would be the root of all evil

And instead it gets replaced with the actual root of all evil, complexity.

To be more specific, incidental complexity.

Many problems have tons of inherent complexity already.

We still need a way to track that there’s some common pattern in the code. So that when we update one pattern we wonder about the others places in code with the same pattern. Avoiding duplication doesn’t solve that

My metric for that is "does that code MEAN the same thing" or "does it just look the same". Has worked quite well for me so far. I frequently find myself making a copy of some code rather than adding a parameter (most commonly done with code that would get some flag added)

Me too ! I don't follow DRY that much, I'm aware that copy pasting is good enough for a few weeks / months to see how things evolve in the future, and do refactor when it's really needed. That said, how do you know if they mean different things ? For GUI code for example, they do mean the same thing, but there's a good chance the code will evolve in the future so premature refactor are wasted time

GUI code changes as fast as your GUI does. If you have two buttons, call makeButton twice. If they have totally different sizes, don't calculate the size inside makeButton. If tomorrow you want a button and a checkbox, don't call makeButton twice with isCheckbox=true the second time.

Fun fact: Win32 checkboxes are buttons with a bitflag that says they are actually checkboxes.

Mostly by looking at the calling site where the code is already used and the calling site where I want to reuse it. If both of those mean the same (calculate the tax on x products, for the purpose of applying to the shopping cart, vs for applying to generating reports) then I'll reuse it, if it can be achieved without adding stuff like flags, in most cases. In other cases, it just looks the same (sum some field + calculate a percentage of that, for example, for discounts vs taxes on products) where it's obvious that they don't mean the same. (Though, I do heavily rely on a good type system to deal with future evolutions of that copied code)

TL;DR: Vibes

Its always about how far ahead in the future you plan ahead. And sometimes this future thinking is wasted time

This right here.

Here we're loading the customer record and updating their discount %

Here we're loading the broker record and updating their commision %

They will have 99% identical code.

It's possible but exceedingly unlikely we have found 2 things that should be a load_record_and_update_percent(file,id,field,val)

Tomorrow the business logic behind one of those will no longer be a simple % and now you have a real mess.

> when we update one pattern we wonder about the others places in code with the same pattern. Avoiding duplication doesn’t solve that

It can, that's all about how aggressively you factor and structure your code, eg. combinators make it easy to reuse code in different application patterns without rewriting.

In which language do you use combinators for that ?

Even in that case the refactor can introduce mental overhead when having too many different variable / properties names

Any language that I can write a combinator in. It's quite easy in C, for example.

Exactly!

[dead]

The hardest part is two algorithms or business logic routines that are nearly identical. What to do? Frequently, all solutions look equally bad!

This is something I've seen repeated time and time again as a criticism of (misused) abstraction and DRY, yet I've never seen ONCE -- and this is not hyperbole, I mean it literally -- a junior making an abstraction with any thought to reuse, generalizing anything, or caring about not repeating code. Most juniors I've worked with are content to just churn new code without paying attention to the codebase at all. This all before the AI deluge, mind you.

Very similar with patterns. I've often read people protesting that juniors overuse design patterns, yet I've seldom seen a junior (mis)use anything more complex than a singleton, and when they use any pattern, it's usually forced upon them by an opinionated Java framework.

This smells more like the fluidity of what people mean by “junior” more than anything else. Journeymen engineers in their over-engineering phase, or even very “senior” expert programmers can suffer over fitting the product to their own mental model. The most senior judgment is to understand when an abstraction makes sense at a customer level, because that defines the durability of a business-logic abstraction.

I do agree this happens with the senior overengineering phase, but the comment I replied to mentioned "especially juniors" and I've heard this trope specifically about juniors, with the implication they want to apply what they learned in college, but this hasn't been my experience at all.

In the early 2000s I often saw juniors and students make staggeringly deep class hierarchies. The equivalent of:

Shape::Polygon::ConvexPolygon::FourSidedConvexPolygon::Square::BlueSquare...

"Intro to OOP" lectures/articles made a deep impression on some people in not quite the right way :)

I was probably that guy! It was all the rage 20 years ago, including worrying about the diamond inheritance problem. What is the equivalent in the current generation? ORM that no one can maintain? Unnecessary dev ops complexity? Anything "web scale"?

Are ORMs still a thing? I've been away from OOP for some years now, but just when I was leaving it, there was a trend firmly against ORMs... my guess was that they were on their way out, replaced by more lightweight libs and frameworks? Or did they make a comeback?

Regarding OOP itself, I also remember when "favor composition over inheritance" became a thing. Was this reversed too?

I love an ORM. I think much of the problems people experience with ORM, OOP, Restful routes, is because they get the domain model wrong. When you model the data correctly you don’t need to have complex queries that push ORM beyond their breaking point.

I think it's more complex than just about getting the domain model wrong. ORMs introduce tradeoffs and are inherently complex and full of caveats (both when deciding to use one or not), as amusingly pointed out in the much-discussed article from 2006: "[Object-Relational Mapping is] The Vietnam of Computer Science" [1]

----

[1] https://archive.is/QVPj (excuse the archive link, Ted Neward's blog seems now lost to linkrot).

> Regarding OOP itself, I also remember when "favor composition vs inheritance" become a thing. Was this reversed too?

I think this is generally still the advice, when working in OOP contexts.

I was working at that time and never saw this from juniors. Overeager seniors and architecture astronauts, sure. But juniors? They mostly copy pasted code without even taking a second look at the codebase, and without bothering to break functions in any sensible way.

Mind you, I mean enterprise and line of business software, not hobbyists. I also mean of their own volition, not the kind of nonsense that Java frameworks often forced on them (all the patterns under the rainbow, factory abstract method factory of abstract methods).

> Very similar with patterns. I've often read people protesting that juniors overuse design patterns, yet I've seldom seen a junior (mis)use anything more complex than a singleton, and when they use any pattern, it's usually forced upon them by an opinionated Java framework.

I've seen it occasionally. There was one junior whose code I saw littered with DTO that're an exact copy of the business object and DAOs where every method is just a wrapper for a Hibernate method. But yeah it's rare.

Win32 checkboxes, radiobuttons and groupboxes are buttons with extra bitflags. What's the common denominator? They all have text and do something when you click on them. Except groupboxes, which don't do something when you click on them.

Were you the same when you were a junior? I was. I didn't have the experience to understand the impact of my changes. The norm reply on HN: "You need more mentoring or code review.". Sometimes (usually?) that is in short supply.

Absolutely. I made all the usual mistakes, and had to be mentored and learn from more experienced programmers.

(Alas! Sometimes you pick up bad habits from experienced people, and being a junior, you don't know better)

Definitely the hallmark of junior. Obsession with code deduplication as the highest pri when it’s quite low among others.

Well I have seen a lot of „expert beginners” who have years of experience on paper but fight tiny duplications like their life depends on it.

„How Software Groups Rot: Legacy of the Expert Beginner”.

https://daedtech.com/how-software-groups-rot-legacy-of-the-e...

Thank you for this link.

I have recently fallen into a job at a small company that really seems to have this culture. Thankfully, I'm only going to be here for a year and a half or so (fixed term job for working holiday visa), but I'm trying to be really aware of how its impacting my career development.

There is no automated testing, no meetings, seemingly no code review process, no standardization of schemas for files that are passed between different applications, all jobs are run on on prem desktop workstations.

With LLMs the cost of duplication is much lower and LLMs

> and LLMs

… sometimes duplicate things unnecessarily.

or stop midsentence

When you run out of tokens, you run out of tokens!

The struggle is real.

Would you like me to outline some concrete steps for dealing with the struggle?

We would love to, but we ran out of tokens.

If you knew in advance which source of truth is important to isolate you don’t have this problem.

The problem is not knowing which of the hundreds or thousands of potential truth sources is worth abstracting. The only real way of finding out is not abstracting them and seeing how it works out.

If the problems in SWE boiled down to solve(f -> MagicallyNoProblemAnymore) we wouldn’t have this discussion.

Code duplication differs from single source of truth applied to data in the sense that data is data but two pieces of code may functionally be the same (they do the same thing) but they might be semantically different in their usage (they’re advertised to achieve different things), in that case coupling them together with deduplication and forcing them to do the same thing doesn’t really make sense, and may make the codebase more difficult to work on in the future (especially in companies where different teams have responsibilities over different parts).

code and data are the same thing.

but thats too philosiphical to talk about or for you to understand.

Put it this way. You're implying code can be duplicated as long as they are advertised to do different things. But can't that conceptually be applied to data as well? I have the number 5 representing age, and I also have the number 5 duplicated somewhere else representing cost. 5 is duplicated because they are "advertised" to do different things.

Because code and data are philosophically the "same" the properties of "single source of truth" applies to both in the same way.

> If they diverge

This is the key, if they are very similar but used by different consumers the chance that they will diverge in the future is very high. And once they do they will break the abstraction.

i don’t think anything in the article advocates for not prioritizing “single source of truth”, as in, if we know that there are multiple sources of truth for something, it should absolutely be deduped. the article is more saying “be a bit more skeptical of any two pieces of code actually representing the same thing” and “be more willing to break apart an abstraction that is trying to represent multiple truths.”

I have always believed what the article more or less states. But you have to remember, the primary and maybe only source of duplication in software is situational dependency (the other word escapes me for this). If there was a universal tree of software functions that could be accessed over a network no function would ever be duplicated and every function would be reused from a central tree. When you put 2+2 inside a method or function body you just duplicated code. or any code inside a method or function body.

This is why we have to have programs that duplicate code by doing anything like adding two numbers together or complex logic that is easy to create bugs when someone wrote it 40 years ago better. Because code reuse is mostly done on a very small scale.

Given thats the case when you start on a new React project as an example you are not reusing application code you are duplicating the react framework so you can duplicate every other web app in every sense except maybe the visual.

There is no such thing as full reuse and until we get to a universal network invocable function tree that can be extended only when its truly unique we never will. Maybe AI will do this. People cannot.

At the end of the day code duplication needs to exist to optimize for local correctness (or incorrectness) and speed and abstractions goal is not to provide pure reuse. Its to provide a place to "put your logic" that may be similar and has access to typical state that some kind of widget might typically need.