>The communication speeds are untenable.
Can it be parallelized or not?
If you take a model, make two copies, and fine-tune each one on different data, what happens when you merge them? Does it work if you freeze different layers?
I think this works if the steps are small enough. And the transfer should become tenable if the steps are big enough. Where's the cutoff?
Yes it can be parallelized, it already is in real AI datacenters and no it doesn't help you. Like everyone else is saying, an AI datacenter is not just a bunch of gaming GPUs connected via normal ethernet and hasn't been for years.
At most a decentralized effort could contribute a little bit to some bigger centralized effort by doing inference and sandboxed CPU work. Modern model training isn't just backprop, it's got a huge and growing CPU and inferencing component too, which doesn't require intense inter-node communication. For instance, doing RL rollouts for agentic coding requires a lot of plain old inferencing and sandboxed containers for the models to practice in. The final results are just a set of rollouts and scores that can be uploaded back to a central datacenter for GRPO to adjust the weights (relatively cheap). But then, of course, you'd have to stick to models small enough to fit on people's computers so it'd never be competitive.
Kinda sounds like we just need better computers.
[flagged]