Very interesting, I work in a similar space with diverse cameras and we’ve been using DepthAnything for a while, but I hadn’t seen these new models yet so thanks!
The association model seems like the special sauce, anything more you can share about that? Architecture, inputs and outputs etc. I’m always really interested in unique CV models.
Yeah, DepthAnything (especially DA3) is also really good. The space is moving so quickly, and I always try to keep an eye on the trending HF depth estimation models :D
We frame the association model as an instance segmentation ranking problem. Given a set of freight item segmentations, the question becomes which freight item is most likely being scanned? Our model needs to analyze the pose of the workers, consider multiple frames and map the whole scene to 3D.