More reference images from different angles is always going to give more accurate information in 3D. From a single 2D image there is a lot of ambiguity in the context. Several different shapes in 3D can be represented in identical ways in 2D. Additional context like lighting shadows etc helps. But more real signal from more images will always be better

I'm not saying it wouldn't be - because that's obvious.

Agreed, wasn't arguing just trying to add additional information in case it isn't obvious to anyone