Sensor fusion is also hard to get right, since you still need cameras you have to fuse the two information streams. Thats mainly a software problem and companies like Waymo have done it, but Tesla was having trouble with it earlier, and if you don’t do it right, your self driving system can be less reliable.
Sensor fusion seems like it'd be a big problem when you're handcoding lots of C++, and way less of a problem when all the sensors are just feeding into one big neural network, as Tesla and probably others are doing now. The training process takes care of it from there.
One of Udacity's first courses was on self-driving, taught by Sebastian Thrun who later cofounded Waymo. He went through some Bayesian math that takes a collection of lidar points, where each point contributes to a probabilistic assessment of what's really going on. It's fine if different points seem to contradict each other, because you're looking for the most likely scenario that could produce that combined sensor data. Transformers can do the same sort of thing, and even with different sensor types it's still the same sort of problem.
> Sensor fusion is also hard to get right, since you still need cameras you have to fuse the two information streams
The response to the challenge shouldn't be whittling down your sensor-suite to a single type, but to get good at sensor fusion.
I think this is the key. In theory - more information stream when fused together (properly) should reduce error. If their stumbling block is the "properly" part, than the rest of those justifications come off as a pretty weak way to sidestep their own inabilities to deliver this properly.
We have lots of evidence of similar strategies being used in other domains, this seems like an especially life-critical domain that ought to have high rigor and standards applied.