That's why a hybrid approach is needed. The agent shouldn't be making up dimensions based on an image. It should use OCR to extract the size table from the datasheet, feed it into a parametric table, and only then map it onto the base enclosure template.