I used to work at a company that did self driving. The sensor setup was more complicated than Tesla (cameras, lidar, etc), but the fact that FSD can still work on this car despite the cameras being in a different place is really impressive to me. Our sensors were pretty sensitive to accurate calibration, and iirc any time we tried to move our sensor array to a new car it took a ton of work to reconfigure it to make the sensor fusion output work.
This is one of the real advantages to the (often insulted and/or chastised) vision only approach to FSD.
People can easily adapt to different vehicles in a similar manner.
Most sensors can be implemented in a way that enables self-calibration.
I'm oversimplifying it here, but the macro process is taking some known attributes and mapping them to what you are observing. For example, if you can detect people, and you know the average height of a person, you can compute where your horizon is, and where you should (or shouldn't) expect to see people in the FOV. You can do this with cameras, lidar, etc. When you have multiple sensors you can do a lot more to have them all sample an object in their own ways and converge on agreement of where they are relative to each other and the object.
I’m not sure this has much to do with vision as opposed to fancy self-calibration software. At least a few years ago, Tesla cars would be in self-calibration mode for a while after delivery while they calibrated their cameras. I think the idea is that it’s cheaper to figure out in software where everything is than to calibrate the camera mounts and lenses at the factory.
I see no reason that LiDAR couldn’t participate in a similar algorithm.
A bigger issue would be knowing the shape of the car to avoid clipping an obstacle.
It probably could, but I imagine a LIDAR system would need a similar (large) amount of training data to enable effective self-calibration across a wide variety of situations.
At some point, with enough sensor suites, we might be able to generalize better and have effective lower(?)-shot training for self-calibration of sensor suites.
Isn’t the model needed rather similar to what’s needed for sensor fusion in general? If you can extract features from each sensor that you expect to match to features from a different sensor, then you can collect a bunch of samples of this sort of data and then use it to fit the transformation between one sensor’s world space and another sensor’s world space.
Teslas require a camera calibration after windshield replacements, same as any cars
The calibration is a 10 minute drive on any marked road though, not the precise positioned target stuff that others use.
There is actually a photo of a Model 3 in factory line doing that precise positioned target stuff, so they do that precise positioned target stuff at the factory. It's also instant compared to a 10-minute drive.
I think the real reason why Tesla is known to require 10-minute calibration drive is, they shipped APHW2 long before the software matured, so they needed means to do it after the cars were shipped "blank". Other manufacturers only ship finalized hardware and software, and so they don't need a scalable tool-free calibration method.
Anyways, my point is that, Tesla cars need calibrations like anything else. This is same for any multi sensor SLAM systems, whether it uses sets of color cameras or laser spinny thingy or laser flash cameras or laser flash color camera thingy or combinations thereof.