For a workload of that size you would be able to negotiate private pricing with AWS or any cloud provider, not just CloudFlare. You can get a private pricing deal on S3 with as little as half a PB. Not saying that your overall expenses would be cheaper w/a CSP than DIY, but its not exactly an apples to apples comparison of taking full retail prices for the CSPs against eBayed equipment and free labor (minus the cost of the pizza).
egress costs are the crux for AWS and they didn't budge when we tried to negotiate that we them, it's just entirely unusable for AI training otherwise. I think the cloudflare private quote is pretty representative of the cheaper end of managed object-bucket storage.
obv as we took on this project the delta between our cluster and the next-best option got smaller, in part bc the ability to host it ourselves gives us negotiating leverage, but managed bucket products are fundamentally overspecced for simple pretraining dumps. glacier does a nice job fitting the needs of archival storage for a good cost, but there's nothing similar for ML needs atm.
What sort of deal are you taking about? Would it be 50% or more?
You can get way higher than 50% discounts with AWS (or any cloud) depending upon the scale of the buy.
Not for that minimum 0.5PB volume.
Even at 10PB, the storage commit discounts won't be anywhere near 50%. Probably more like 10-20%, if that.