It seems like the load_score serves a proxy for how much needs to be done. Is there a real value that could be used instead? The solution requires syncing with all of the GPU nodes anyways.