Good to have more open weight models, and I really appreciate the in-depth write-up.
I also like the "keep the manifold wide" approach of trying to make a model capable of many styles as opposed to getting it "dialed in" for a dozen of style presets.
But it does feel very much like "fighting the past war" - now that advanced "image-to-image"/"agentic composition" models like Nano Banana 2 or Images 2.0 are out there in force.
I seriously doubt that the basic Qwen 3 VL in cross can get anywhere near that level of I2I. And robust I2I is very desirable - editing, adjustment, character consistency, the generalization of whatever you're doing with style transfer now (underexplained BTW).
Trying to hit that level of I2I is not by any means easy, but it's pretty clear to me that this is where the next frontier for image models lies. Feels like Ideogram might be building up to it, but I'm yet to see it anywhere else in open weight space.
I appreciate the skepticism but we find internally that this model is used more than Nano Banana for many cases like moodboarding (also, 4x cheaper than NBP never hurts). Agentic workflows are compatible with Krea 2 so I’m not sure I follow there. If you are talking about an edit model, that’s coming too.
Also, we are on par with them in t2i benchmarks, check the artificial analysis link I posted in my top comment.
And you cannot re-train nano banana or ChatGPT to understand your brand, which is what our customers complain about constantly.
Plus open-source! It’s hard to do an apple to apple comparison.
"Compatible" is one thing - "built for" is a different beast. The difference can be like that between Images 1.0 and Images 2.0 - the sheer leap in compositional capabilities was staggering.
"Edit model" is a part of it, yes. So is style transfer. But less as an endpoint and more of a subset of what advanced I2I enables.
"Re-train to understand your brand" is a fine marketing pitch, but in practical terms, it's hard to justify burning a LoRA for most uses. Enthusiasts absolutely do it, but enthusiasts are built different. Robust I2I can accomplish a lot of the same, but with a workflow that's closer to "drag and drop your references" than to "try to get a LoRA to do what you wanted it to do on a very slim set of images".
Modern LoRA pipelines are getting closer to "reliable" and "braindead simple", but you can't escape the "wait N hours for the GPUs to churn" of fine tune no matter what you do. And iteration time kills - a lot of the value of AI in workflows is that it does what it does fast and allows you to iterate at speed.
You can think of "LoRA vs I2I" as of an image twin of "SFT vs in-context learning" of LLM land. Both are useful, neither substitutes for the other fully, but there's a reason why most reach for the latter way before they reach for the former.
I like the T2I from what I've seen, mind. Perhaps more than Images 2.0 or even NB2. I just think that focusing solely on T2I to the exclusion of advanced editing and composition capabilities is a very 2024 thing.
"it's hard to justify burning a LoRA for most uses" -> Not really, it's literally cheaper on Krea than using ChatGPT Images; NBP and GPT-Images 2.0 are quite expensive, you'd be surprised. LoRAs are one of our most stickiest features (this doesn't mean they are intuitive; this just means that customers who use it, suddenly are retained way more because of how much better their images become). But yeah, anything out there doesn't offer a nice training UIs like Krea where you can just drag-and-drop a moodboard and get a LoRA in a few minutes. It literally takes only a few minutes on Krea; definitely not "N hours for GPUs to churn".
Learn more here: https://www.krea.ai/blog/krea-2-lora-training.
> And you cannot re-train nano banana or ChatGPT to understand your brand, which is what our customers complain about constantly.
yeah... having been in this line of work, it's a dead end. i think you know that though.
elsewhere:
> LoRAs are [something that you don't need in gpt-images-2 because it just does the task people ask for.
look... I don't know. I want you to succeed. You're talking about customers where none of this matters. Anyone who uses the word brand doesn't have any artistic sensibility. Pricing is 1/4th of Gemini but who cares, your customers generate like 10,000 or fewer images per month. Probably closer to 100. then $20k/mo for enterprise contracts where forward deployed engineers don't really make sense. because what does a Krea FDE know about art? nothing.
it would be different if the business were "Palantir, but for creative stuff." First, you'd have to know what that means, which you don't, you're hearing about it for the first time from me. Second, does that even exist? It took Palantir 15 years to go anywhere. I don't think it can exist in this environment.
do we have too many open weights image models? the end user wants the slop filled mindless meme chasing crap that is obsolete on arrival; or they want 1 idea they can express easily with existing, limited generative ai and have a professional artist finish. gemini can do that just fine.
This model does image to image; whats the issue with Qwen 3 VL; is style transfer unexplained? " reference" is mentioned 11 times on the page (more specifically, I read it and it seemed to discuss it a lot)