What I like about MiMo too is that it is multimodal.

For example, I can send screenshots of what I'm developing and it understands.

Only non-pro is multimodal I believe

I thought only the MmiMo 2.5 non-pro was multimodal?