I do appreciate the scrappiness of your solution. Used drives for a storage cluster is like /r/homelab on steroids. And since it's pretraining data, I suppose data integrity isn't critical.

Most venture-backed startups would have just paid the AWS or Cloudflare tax. I certainly hope your VCs appreciate how efficient you are being with their capital :)

worth stressing that we literally could not afford pretraining without this, approx our entire seed round would go into cloud storage costs