You cannot say HTAP is dead when the alternative is so much complexity and so many moving parts. Most enterprises are burning huge amounts of resources literally just shuffling data around for zero business value.

The dream is a single data mesh presenting an SQL userland where I can write and join data from across the business with high throughput and low latency. With that, I can kill off basically every microservice that exists, and work on stuff that matters at pace, instead of half of all projects being infrastructure churn. We are close but we are not there yet and I will be furious if people stop trying to reach this endgame.

> The dream is a single data mesh presenting an SQL userland where I can write and join data from across the business with high throughput and low latency.

That exists, and has for years: an extremely large DB loaded to the gills with RAM and local NVMe drives. Add some read replicas if you need them, similarly configured. Dedicate one for OLAP.

This doesn’t work quite as well as people assume. The first limit is simply size, you can only cram a few petabytes of NVMe in a server (before any redundancy) and many operational analytic workloads are quite a bit larger these days. One of the major advantages of disaggregated storage in theory is that it allows you to completely remove size limits. Many operational analytic workloads don’t need a lot of compute, just sparse on-demand access to vast amounts of data. With good selectivity (another open issue), you could get excellent performance in this configuration.

Ignoring the storage size limits, the real issue as you scale up is that the I/O schedulers, caching, and low-level storage engine mechanics in a large SQL database are not designed to operate efficiently on storage volumes this large. They will work technically, but scale quite a bit more poorly than people expect. The internals of SQL databases are (sensibly) optimized for working sets no larger than 10x RAM size, regardless of the storage size. This turns out to be the more practical limit for analytics in a scale-up system even if you have a JBOD of fast NVMe at your disposal.

> many operational analytic workloads are quite a bit larger these days.

What are the use cases where such workloads come up, aside from Google-level operations? Just trying to understand what we are talking about.

Sensor and telemetry analytics workloads in boring industrial sectors are all at this scale, even at companies that aren’t that large revenue-wise. TBs to PBs of new data per day.

What are these used for, to all have to be in a single unified database?

A large part of those workloads is stitching together a single derived model of operational reality and how different entities interact over time from the samples you get from each individual source. You need a running log of all entity behavior and interactions over time to look back on in order to contextualize what you see at the current point in time. Most of this is not pre-computable because the combinatorial state space is too large so every analytic query needs to be able to see across every relationship between sources that can be inferred.

It is essentially a spatial and/or graph analytic model evolving over time. Any non-trivial data models that capture dynamics in the physical world looks like this.

In fairness, all popular analytics platforms handle these workloads poorly regardless of if they are vertically or horizontally scaled. These workloads usually cannot be cached even in theory, so performance and scalability comes down to the sophistication of your scheduler design.

Thanks. It would be interesting to talk about the business specifics, but that would move into confidential territory I guess.

It works to a certain point, yes, but I daresay that the overwhelming majority of OLTP needs are in the <= TB range, not PB. OLAP is its own beast, though I'll also say that most modern tech companies' schema is hot garbage, full of denormalized tables for no good reason, JSON everywhere, etc. and thus the entire thing could be much, much smaller if RDMBS was used as it was intended: relationally.

> You cannot say HTAP is dead when the alternative is so much complexity and so many moving parts. Most enterprises are burning huge amounts of resources literally just shuffling data around for zero business value.

We built an HTAP platform as a layer over Cassandra for precisely that reason round about when Gartner invented the term.

In finance and fintech, there are ample use cases where the need for transactional consistency and horizontal scalability to process and report on large volumes come together, and where the banks really struggle to meet requirements.

I dug out an old description of our platform, updated it a bit, and put it on Medium, in case anyone is interested: https://medium.com/@paul_42036/a-technical-description-of-th...

A sibling mentioned GraphQL. That works, but it was really built for clients interacting with Meta's Ent framework. The web layer is largely a monolith, and user objects are modeled as "ents," linked to each other, and stored in heavily cached MySQL. GraphQL exposes access to them.

Maybe GraphQL can be your savior?

The problem is generally not the surface language, but the underlying distribution/querying mechanism.

It is, of course, possible that SQL is too complex a language for this dream.

GraphQL is just a language much like SQL.

Yes but it provides a standardized way to deliver a unified interface to query your data, which is what OP is after?

“With high throughput and low latency” are the operative requirements for OP and GQL doesn’t help with either on its own.