It's quite cheap to just store data at rest, but I'm pretty confused by the training and networking set up here. It sounds like from other comments that you're not going to put the GPUs in the same location, so you'll be doing all training over X 100 Gbps lines between sites? Aren't you going to end up totally bottlenecked during pretraining here?

30PB / 100Gbps comes down to about a month, 4 links would give you a week, so that seems pretty quite acceptable for a training run, especially since you can overlap the initial loading of the array with the first training, i.e train as data becomes available.

It goes without saying any data pre-processing needs to be done before writing, at the storage site, or on the training GPUs.

yeah we just have the 100gig link, atm that's about all the gpu clusters can pull but we'll prob expand bandwidth and storage as we scale.

I guess worth noting that we do have a bunch of 4090s in the colo and it's been super helpful for e.g. calculating embeddings and such for data splits.

How did you arrive at the decision of not putting the GPU machines in the colo? Were the power costs going to be too high? Or do you just expect to need more physical access to the GPU machines vs the storage ones?

When I was working at sfcompute prior to this we saw multiple datacenters literally catch on fire bc the industry was not experienced with the power density of h100s. Our training chips just aren't a standard package in the way JBODs are.

Isn't the easy option to spread the computers out, i.e. not fill the rack, but only half of it?

A GPU cluster next to my servers has done this, presumably they couldn't have 64A in one rack so they've got 32A in two. (230V 3phase.)

Rackspace is typically at a premium at most data centers.

My info may be dated, but power density has gone up a ton over time. I'd expect a lot of datacenters to have plenty of space, but not much power. You can only retrofit so much additional power distribution and cooling into a building designed for much less power density.

This is my experience as well. We have 42u racks with 8 machines in them because we cant get more power circuits to the rack.

yep this was the case for us.

I'm more surprised that a data centre will apparently provide more power to a rack than is safe to use.

Adding the compute story would be interesting as a follow up.

Where is that done? How many GPUs do you need to crunching all that data. Etc.

Very interesting and refreshing read though. Feels like what Silicon Valley is more about than just the usual: tf apply then smile and dial.