We sharded over 20 TB that we know about.
This is probably a typo, right? 20TB isn't that big. I would imagine they've sharded a lot more than that We sharded over 20 TB that we know about.
This is probably a typo, right? 20TB isn't that big. I would imagine they've sharded a lot more than that
If your working set is 20 TB, then it's pretty big. Each database has its own mix of hot/cold data, so it's impossible to compare without more information. A better measure might be IOPS. RDS has fairly low maximum IOPS unless you spend a lot more for provisioned IOPS or use Aurora.
You are correct. As a point of comparison: almost ten years ago at Segment we had a single Aurora PostgreSQL instance with ~50T of data, it was used to index potential identity data in a much larger corpus of files stored in S3.
For a vast majority of use cases 20TB is positively enormous.
RDS caps out at 64 TB unless you use Aurora, so 20 TB is totally manageable without sharding.
This product is for Postgres deployments that are so large they need to be sharded. For these use cases, I think 20TB is about normal.
Yes. But for most workloads it is not much for PostgreSQL. You often will not have to shard at all.
Sure, but 20TB in “the only database you need” is mere hours or minutes worth of data for many workflows.
that article seems to suggest 20TB total over the dozen deployments in prod.