Hacker News

I have always called this the “one true taxonomy” problem, because whenever you sit with multiple stakeholders in a room talking about a taxonomy, you can never get to agreement, because there is no such thing as the “one true taxonomy”.

Any hierarchical taxonomy classifies on one dimension at each taxonomic level. Invariably someone wants to classify on one criteria when someone else wants to classify on another. Taxonomies that humans use aren’t multi-dimensional. So if there is a disagreement, someone wins and someone(s) has to lose.

No one is wrong; they just have different priorities or preferences or goals.

So now as an architect I never argue (and seldom discuss) taxonomies. I make two points and then bow out:

1. Whatever your taxonomy is, you need a rubric for each level. You need a procedure or set of questions that unambiguously map any $THING you encounter into exactly one bucket. Validate that competent people with no specific domain knowledge can properly classify things with your rubric; it must be repeatable by amateurs, not just experts (software is dumb).

2. Existence trumps theory. If there exists a taxonomy and rubric for what you’re classifying, you need to provide a $DARN_GOOD_REASON why this wheel needs reinventing. Personal preference and your 1% edge case probably don’t justify all the work to reinvent everything.

Then, I go back to the implementers and tell them to design in a tagging system, which is a DIY taxonomy, and except in ridiculous use cases, I can make indexes make it fast enough to let everyone overlay their own classification system.

> Then, I go back to the implementers and tell them to design in a tagging system, which is a DIY taxonomy, and except in ridiculous use cases, I can make indexes make it fast enough to let everyone overlay their own classification system.

This 100x! I wish this were more common.

The key property of a tree is that there a unique path (address) for each element, which is a useful property in the implementation layer. But forcing that on users is a horribly leaky abstraction.

Ideally separate the low-level implementation from the interface, and allow users their own way to address content. I imagine object storage (with UUIDs or whatever) is often good enough for the lower layer. For the interface layer, tags are an improvement on categories (tree structure), but I think there's also room for more innovation (fuzzy matching, AI-driven interfaces, etc) that start by allowing trading-off precision for recall but then allow regaining precision by adding more approximate qualifiers to the filtering.

----

PS: Pushing this approach to 11/10...

An intriguing (crazy?) application of this idea would be: what if we did this to the concept of a codebase? Make it a database (with all the corresponding improvements over a filesystem) -- it's no longer a tree of files, and allow users to query code like "that foo which accepts a bar, frobnicates its internal state, and emits a mutated baz". Note that this might also solve the "naming things" problem.

This setup seems like a powerful abstraction for AI coding agents. All that back-end power (database >> filesystem) is something they can easily leverage, and they can also be built to robustly resolve your fuzzy queries into precise addresses, and then update the code based on your desired outcome.