> many operational analytic workloads are quite a bit larger these days.
What are the use cases where such workloads come up, aside from Google-level operations? Just trying to understand what we are talking about.
> many operational analytic workloads are quite a bit larger these days.
What are the use cases where such workloads come up, aside from Google-level operations? Just trying to understand what we are talking about.
Sensor and telemetry analytics workloads in boring industrial sectors are all at this scale, even at companies that aren’t that large revenue-wise. TBs to PBs of new data per day.
What are these used for, to all have to be in a single unified database?
A large part of those workloads is stitching together a single derived model of operational reality and how different entities interact over time from the samples you get from each individual source. You need a running log of all entity behavior and interactions over time to look back on in order to contextualize what you see at the current point in time. Most of this is not pre-computable because the combinatorial state space is too large so every analytic query needs to be able to see across every relationship between sources that can be inferred.
It is essentially a spatial and/or graph analytic model evolving over time. Any non-trivial data models that capture dynamics in the physical world looks like this.
In fairness, all popular analytics platforms handle these workloads poorly regardless of if they are vertically or horizontally scaled. These workloads usually cannot be cached even in theory, so performance and scalability comes down to the sophistication of your scheduler design.
Thanks. It would be interesting to talk about the business specifics, but that would move into confidential territory I guess.