It probably could, but I imagine a LIDAR system would need a similar (large) amount of training data to enable effective self-calibration across a wide variety of situations.
At some point, with enough sensor suites, we might be able to generalize better and have effective lower(?)-shot training for self-calibration of sensor suites.
Isn’t the model needed rather similar to what’s needed for sensor fusion in general? If you can extract features from each sensor that you expect to match to features from a different sensor, then you can collect a bunch of samples of this sort of data and then use it to fit the transformation between one sensor’s world space and another sensor’s world space.