Yeah, DepthAnything (especially DA3) is also really good. The space is moving so quickly, and I always try to keep an eye on the trending HF depth estimation models :D
We frame the association model as an instance segmentation ranking problem. Given a set of freight item segmentations, the question becomes which freight item is most likely being scanned? Our model needs to analyze the pose of the workers, consider multiple frames and map the whole scene to 3D.