There's a new paper from ICCV which basically tries to render every modality as images: https://openaccess.thecvf.com/content/ICCV2025/papers/Hudson...