That's a multimodal model with text output, I think GP is asking about image generators.