> Does it really cost that much to maintain your own 20PB storage cluster?

If you think S3 = storage cluster than the answer is no.

If you think about S3 what it actually is: scalable, high throughput, low latency, reliable, durable, low operational overhead, high uptime, encrypted, distributed, replicated storage with multiple tier1 uplinks to the internet than the answer is yes.

>scalable, high throughput, low latency, reliable, durable, low operational overhead, high uptime, encrypted, distributed, replicated storage with multiple tier1 uplinks to the internet

If you need to tick all of those boxes for every single byte of 20PB worth of data, you are working on something very cool and unique. That's awesome.

That said, most entities who have 20PB of data only need to tick a couple of those boxes, usually encryption/reliability. Most of their 20PB will get accessed at most once a year, from a predictable location (i.e. on-prem), with a good portion never accessed at all. Or if it is regularly accessed (with concomitant low latency/high throughput requirements), it almost certainly doesn't need to be globally distributed with tier1 access. For these entities, a storage cluster and/or tape system is good enough. The problem is that they naïvely default to using S3, mistakenly thinking it will be cheaper than what they could build themselves for the capabilities they actually need.