Congratulations, you discovered a mutex.

Is it really a distributed system or just a bunch of services with a central database?

I don't think it's true that distributed and decentralized mean the same thing. A hub and spoke rail system is centralized, but it's still a distributed system, if it has multiple trains running concurrently.* A distributed system has to coordinate somehow, and a single central DB is one way of doing it.

*: edit, maybe a better example here is a rail system with a single central dispatcher is centralized but may still be distributed

In fact - if you're building a very large distributed system the goal is usually to shrink that centralized component to the smallest and most robust surface you can. If the system is well designed it is amazing just how much consistency power you can get from a tiny component of centralization.

There are always tradeoffs of course, but building a truly decentralized system requires some really difficult compromises to correctness. The two general's problem is a great piece of reading on this topic - distribution always requires compromises in general, but to fully remove an authority on truth gets quite tricky.

I think Ducklake[1] is a terrific example of this. They said "look, let's build a lake house over S3, but for the bit that needs strong consistency (the manifest of which S3 blobs are in play), let's use Postgres". Postgres as a metadata catalog or control plane is brilliant for this, since you get strong consistency and the scaling story around a metadata catalog is far different than the volume of data you need to store. Use S3 for volume, Postgres for consistent metadata.

A similar pattern has spilled out of projects like Warpstream[2], which I suspect is using Postgres behind the scenes of their control plane.

[1]: https://ducklake.select

[2]: https://www.warpstream.com/

I have built and maintain a system that uses a very similar system - we register artifacts with UUIDs into S3 in a specifically write-once, never edit, never remove approach and then store those UUIDs in a postgres system. We simply juggle around the connection of other model objects to UUIDs as needed allowing us to achieve safe guarantees without burdening the centralized system with the massive volume (these artifacts are often 50MB+ PDFs). I will mention that I am quite fond of this approach but it's good to be aware that introducing levels of abstraction like this do necessarily widen some fail points on the storage side - if your service uses multiple persistence stores each additional store exposes yet another point where inconsistency could be introduced and/or a message could be lost. Still, fragmenting your data over multiple stores that are particularly well suited for their specialized usages can be huge for performance and cost.

If you use hashes of the content itself for your UUIDs, you'll (a) get deduplication and data consistency checking for free and (b) have basically implemented (a subset of) git that uses S3 backing instead of a local filesystem directory :)

> The two general's problem is a great piece of reading on this topic

It is!

And the solution is to add an extra general on the left side. Let's call him Outus Boxus. The two generals on the left side can communicate in perfect lockstep. Then if you need the general on the right to find out about something, you can send a few workers to tell him or something...

More seriously though, you can have a DS for two reasons: tech or political.

Tech means scaling or reliability. So clients can be serviced by any of the nodes.

Political means different actors don't have a central authority. You can't stick two banks into one db.

This technique doesn't seem to address either aspect.

Exactly! It's a distributed system, with many processes performing work in parallel, with a central database as a coordination point, used as little as possible. A mutex wouldn't get quite the same performance :)

A more modern term is your system is a single architectural quantum’

Neal Ford calls this a distributed monolith because a change to a database schema can break every single service at once, but there are very valid uses of this method.

There are decades of books on the foot guns as we used this even back in the client-server days.

One suggestion I have is to research where the first version of SoA failed, especially as these systems tend to erode into Enterprise Service Busses.

Products like Apache airflow tend to have value not because of the persistence layer, but because they force workflows into DAGs, which is an enforceable structural constraint, while SQL, being declarative, can sometimes force you into trying to enforce governance through observing behavior.

The former is not subject to Rice’s theorem, while the latter is.

If you actively control for these it will greatly increase the lifetime of this system before (or if) you reach the point you have to replace the system.

> Is it really a distributed system or just a bunch of services with a central database?

I've asked myself this question every single time I've had to use Zookeeper.

Apache Kafka being the poster child of the problem, with HBase in a close second.

[deleted]