Hacker News

>scalable, high throughput, low latency, reliable, durable, low operational overhead, high uptime, encrypted, distributed, replicated storage with multiple tier1 uplinks to the internet

If you need to tick all of those boxes for every single byte of 20PB worth of data, you are working on something very cool and unique. That's awesome.

That said, most entities who have 20PB of data only need to tick a couple of those boxes, usually encryption/reliability. Most of their 20PB will get accessed at most once a year, from a predictable location (i.e. on-prem), with a good portion never accessed at all. Or if it is regularly accessed (with concomitant low latency/high throughput requirements), it almost certainly doesn't need to be globally distributed with tier1 access. For these entities, a storage cluster and/or tape system is good enough. The problem is that they naïvely default to using S3, mistakenly thinking it will be cheaper than what they could build themselves for the capabilities they actually need.