I'm not sure it does a sort. Each group of threads only handles a select number of gaussians

Yea, I think avoiding sorting is kinda the whole point here