I’m not sure this has much to do with vision as opposed to fancy self-calibration software. At least a few years ago, Tesla cars would be in self-calibration mode for a while after delivery while they calibrated their cameras. I think the idea is that it’s cheaper to figure out in software where everything is than to calibrate the camera mounts and lenses at the factory.

I see no reason that LiDAR couldn’t participate in a similar algorithm.

A bigger issue would be knowing the shape of the car to avoid clipping an obstacle.

It probably could, but I imagine a LIDAR system would need a similar (large) amount of training data to enable effective self-calibration across a wide variety of situations.

At some point, with enough sensor suites, we might be able to generalize better and have effective lower(?)-shot training for self-calibration of sensor suites.

Isn’t the model needed rather similar to what’s needed for sensor fusion in general? If you can extract features from each sensor that you expect to match to features from a different sensor, then you can collect a bunch of samples of this sort of data and then use it to fit the transformation between one sensor’s world space and another sensor’s world space.