So distribute copies of the model in RAM to multiple machines, have each machine update different parts of the model weights, and sync updates over the network

decentralized training makes a lot more sense when the required hardware isn't a $40K GPU...