Thanks. I will try this! I need to read up on how to work with vision models for both generation and understanding.