Used Disks, No DR, not exactly a real shoot out.

True, though this is specifically for pretraining data (S3 wouldn't sell us used disk + no DR storage).

I do appreciate the scrappiness of your solution. Used drives for a storage cluster is like /r/homelab on steroids. And since it's pretraining data, I suppose data integrity isn't critical.

Most venture-backed startups would have just paid the AWS or Cloudflare tax. I certainly hope your VCs appreciate how efficient you are being with their capital :)

worth stressing that we literally could not afford pretraining without this, approx our entire seed round would go into cloud storage costs

You're in a seismically active part of the world. Will the venture last in a total loss scenario?

We're currently 1/1 for the recent 4.3 magnitude earthquake (though if SF crumbles we might lose data)

4.3 is a baby quake. I'd hope that you'd be 1/1!

They spent $300,000 on drives, with AWS they would have spent 4x that PER MONTH. They're already ahead of the cloud.

AWS/cloud doesn't factor into my question what so ever. Loss of equipment is one thing. Loss of all data is quite a different story.