is that actually how they train them in the datacenter? the trillion sized weight vector gets cloned and sent off to groups of GPUs and averaged after?