Exclusion constraints should work well for this case. On any transaction isolation level.

https://www.postgresql.org/docs/current/ddl-constraints.html...

That's indeed the first thing you would want to try. In this case these time ranges can repeat according to a formula, like a cron job for example.

An exclusion constraint needs concrete values to compare, but here we can't pre-compute and index every future value (there are infinitely many)

We solve a diophantine equation for this check (if there is a solution to Ax - By = 0, then formulas A and B can conflict at some point)

This seems like a fascinating problem domain! I'm incredibly curious what product you're working on - if you're taking this much attention to detail on recurring events, it seems like it's a step beyond most scheduling systems I've seen.

Heck, https://www.sciencedirect.com/science/article/pii/S147466701... and https://www.amazon.com/Declarative-Models-Concurrent-Cyclic-... seem to indicate this is still an area of active research. And, to your point, almost certainly too complex to try to encode into a Postgres index - that would be a paper-worthy project unto itself!

(I've had to implement the naive approach here, which nested-loops over dates and rules and finds ones that might conflict. Definitely not meant to scale, and would be a nightmare if not at a date-level granularity!)

No pressure to share, but would love to learn more!

It's a scheduling system in the medical field where recurring events can happen in a particular room, this might be every N days at a particular time, or every three weeks on Tuesday and Friday between some start date and some end date.

I'm not entirely familiar with the literature, so we did some simplifications to keep it manageable, and it's very possible there might be a better approach!

We model the events generated by our recurring rules as intervals [SA+n*A, SA+n*A+DA), where A is a constant integer number of days (e.g. A=3*7 for an event on Tuesday every 3 weeks), SA is the start date, and DA is the duration of an event (... which can be >24h to make things more interesting). It might continue recurring forever, or it might stop by some end date.

Now if you have two of these recurring events with periods A and B, you can think of each of them as a periodic function, and the combination of them will have a period equal to LCM(A, B). So we could check everything modulo LCM(A, B) once, and that would hold infinitely far forward.

But actually, we can do even better by taking the difference Δ=SA-SB mod GCD(A, B). You can sort of think of it as the closest approach between these two recurring events. If that's less than DA or more than the GCD-DB, these events are eventually going to overlap. There's some extra details to check whether overlap happens before either end date (if any), but that's the idea in a nutshell.

---

Where it gets really interesting is when you introduce timezones and daylight savings time (DST). Now a day is not always 24h, so there isn't a nice [SA+n*A, SA+n*A+DA) at regular intervals, and we can't naively work modulo the LCM or GCD anymore.

But we can notice that the days of the year on which a DST transition falls generally follow simple rules, and would repeat every 28 years (actually every 400 years due to leap years). In practice for a given timezone, for example CEST, there are only 14 unique days of the year on which the DST transitions can fall, and it repeats after a full period.

So if we want to check for overlap, we can treat those DST transition days as special cases, and all the time intervals in between DST transition days look locally "UTC-like" (time doesn't jump forward or backwards, if we ignore leap seconds), so the previous formula continues to work on all of these.

But at some point things become a little messy, so we did some simplifications =)

My napkin math suggests the size of the supercycle when you take two recurring events A and B with periods < 365 days and the 400 year cycle for DST transition can be about a hundred thousand years in the worst case.

But considering that timezones and daylight savings time are fundamentally political decisions, and that the IANA tzdata file is updated regularly, there is not much sense in applying the current rules thousands of years into the future! So we check at most 400 years ahead, and we consider that when timezones are involved, the overlap check is best effort and we prepare for the fact that it could miss a DST transition – no matter what math we do, a country could change the rules at any time, and a day that used to be 24h could now be 25, 22:30, or even skipped entirely, invalidating all past assumptions.

Thank you so much for sharing this! That sounds like an equally fun and headache-inducing problem indeed!

The idea that someone could book a room for 400 years of consistency guarantees is somewhat hilarious, yet absolutely the kind of thing that would check a box in a regulated environment like healthcare!

It does speak to a larger issue I see quite often, which is that capturing structured intent as of the time an event is created, with the context that led to that decision, is incredibly hard to model. Because no matter what, something about the system will change, like daylight savings time or assumptions around resource/room availability, in an unpredictable way such that the data that had been used to drive a confident evaluation of lack-of-conflict at insertion time, is now insufficient to automatically resolve conflicts.

There's no single way to solve this problem - in a way, "reinterpreting intent" is why programming is often equal parts archaeology and anthropology. Certainly gives us very interesting problems to solve!

To circle it back to the original conversation: the better our databases are at handling large volumes of written data, the more context we can capture around insertion-time intent in an easy-to-query way, and the better we'll be at adapting to unforeseen circumstances!