Co resident threads might not get any speed up here since coherency instructions are functionally operations on the L2 cache.
Co resident threads might not get any speed up here since coherency instructions are functionally operations on the L2 cache.