To me, this article misses the mark.
The database is not your domain model, it is the storage representation of your domain model on disk. Your REST/grpc/whatever API also isn’t your domain model, but the representation of your domain model on the wire.
These tools (database, protocols) are not the place to enforce making invalid states un-representable for reasons the article mentions. You translate your domain model into and out of these tools, so code your domain model as separately and as purely as you can and reject invalid states during translation.
I disagree completely.
You’re already paying the cost of abstraction for using a certain database or protocol, so get the most bang for your buck. If you can encode rules in a schema or a type, it’s basically free, compared to having to enforce it in code, hoping that future developers (yourself or others) will remember to do the same. It just eliminates and entire universe of problems you have to deal with otherwise.
Also, while relaxing constraints is usually easy or at least doable, enforcing new constrains on existing data is impossible in practice. Never seen it done successfully.
The only exception to this rule I typically make is around data state transitions. Meaning that even when business rules dictate a unidirectional transition, it should be bidirectional in code, just because people will click the wrong button and will need a way to undo “irreversible” actions.
You can't gloss over the difference between "schema" and "type." Schemas exist at the edges between pieces of software. They govern how you talk to databases or APIs; they needs to be forwards- and backwards-compatible. A type, conversely, exists within one program. You can update types and type invariants without needing to migrate state or make a breaking API change.
"Make invalid states unrepresentable" is much better advice for the internals of a program than it is for the sql or protobuf schemas at that program's margins. The point of "parse, don't validate" is to transform flexibly-represented external data into strongly-typed internal data that respects invariants; since you can update your internal data model much more easily than the models of external data, keeping external data representations flexible is sometimes just as important as keeping internal representations strict.
+1
In cases like electronics & protocols, it's very often a good idea to add an extra "reserved & unused" section for compatibility reasons.
These representations need not be 1:1 with the domain model. Different versions of the model might reject previously accepted representation in case of breaking changes. It's up to the dev to decide on which conflict reconciliation strategy they should take (fallback values, reject, compute value, etc).
Working with a precise domain model (as in, no representable invalid states) is way more pleasant than stringly-typed/primitives mess. We can just focus on domain logic without continuously second-guessing whether a string contains valid user.role values or it contains "cat".
No, this is wrong, and the example of the database is the best example.
Very regrettably, RBDMS (traditionally) has not support for the complete relational model, neither the way to represent nested relations and for what we have as today with algebraic types.
Making impossible to TRULY model the domain. (plus other factors related to sql and such).
Is the same as old OOP languages that also were incapable of do the same.
The INHABILITY of TRULY model the domain is what is harmful. You need to impledance mismatch all the time, everywhere.
The second thing: What is the DOMAIN?
And here is where you are into something: The DOMAIN of the database and the DOMAIN of the transport protocols and the DOMAIN of the GUI, etc are DISTINCT domains!
Was VERY common in the past that the DBA (that actually understand RDBMS not like most current devs) know that he must model the DB in ways that support not only N apps (with different domains) and hopefully provide a abstraction for them (in terms of VIEWS and functions) but also for the operators.
The article point to this, but incorrectly say is a problem of trying to make the invalid unrepresentable, when, if you have doing DBs for decades, is the total opposite.
For example, being unable to eliminate the mistake of NULL is a headache with not end. Or try to use `1 0` as bools, or need to fill the table with a lot of NULLables columns because you can't represent a algebraic OR.
I'm not sure I agree.
My perfunctory reading is thus: first you couple your state representation to the business logic, and make some states unrepresentable (say every client is category A, B or C). Maybe you allow yourself some flexibility, like you can add more client types.
Then the business people come and tell you some new client is both A and B, and maybe Z as well, quickly now, type type type.
And that there's a tradeoff between:
- eliminating invalid states, leading to less errors, and
- inflexibility in changes to the business logic down the way.
Maybe I misunderstood, but if this is right, then it's a good point. And I would add: when modelling some business logic, ask yourself how likely it is to change. If it's something pretty concrete and immovable, feel free to make the representation rigid. But if not, and even if the business people insist otherwise, look for ways to retain flexibility down the line, even if it means some invalid states are indeed representable in code.
IMO, rather than focusing on flexibility vs inflexibility when deciding "tight domain model" or not, it's much better to think about whether your program requirement can tolerate some bugs or not.
Say we have a perfect "make illegal states unrepresentable" model. Like what you said, it's kind of inflexible when there are requirement changes. We need to change affected codes before you can even proceed to compile & run.
On the other hand, an untyped system is flexible. Just look at Javascript & Python ecosystem, a function might even contain a somewhat insane and gibberish statement, yet your program might still run but will throw some error at runtime.
Some bugs in programs like games or average webapp don't matter that much. We can fix it later when users report the bug.
While it's probably better to catch whether an user can withdraw a negative balance or not at compile time, as we don't want to introduce "infinite money glitch" bug.
Types are part of the picture, sure. But there's more. Essentially, I'd say, if your whole business logic representation needs a major refactor every time the underlying changes, then you better enjoy refactoring.
Yep. "Make invalid states unrepresentable" pairs well with "parse, don't validate"; the states that are valid for your business domain are (maybe) not the same as the states that are valid for your storage or wire format, so have different representations for these things.
> To me, this article misses the mark.
Yes, I agree. The blogger shows a fundamental misunderstanding of what it means to "make invalid states unrepresentable". I'll add that the state machine example is also pretty awful. The blogger lists examples of usecases that the hypothetical implementation does not support, and the rationale to not implement it was that "this can dramatically complicate the design". Which is baffling as the scenario was based on complicating the design with "edge cases", but it's even more baffling when the blogger has an epiphany of "you need to remain flexible enough to allow some arbitrary transitions". As if the whole point was not to allow some transitions and reject all other that would be invalid.
The foreign key example takes the cake, though. Allowing non-normalized data to be stored in databases has absolutely no relation with the domain model.
I stopped reading after that point. The blog post is a waste of bandwidth.
To be fair to Sean (post author), it does kind of make sense if you view "make invalid states unrepresentable" from distributed system perspective (Sean's blog tends to cover this topic) as it way more painful to enforce there.