I'm debating whether to add the FLUX Kontext model to my GenAI image comparison site. The Max variant of the model definitely scores higher in prompt adherence nearly doubling Flux 1.dev score but still falling short of OpenAI's gpt-image-1 which (visual fidelity aside) is sitting at the top of the leaderboard.
I liked keeping Flux 1.D around just to have a nice baseline for local GenAI capabilities.
https://genai-showdown.specr.net
Incidentally, we did add the newest release of Hunyuan's Image 2.0 model but as expected of a real-time model it scores rather poorly.
EDIT: In fairness to Black Forest Labs this model definitely seems to be more focused on editing capabilities to refine and iterate on existing images rather than on strict text-to-image creation.
Thanks for sharing, this was a great read.
Nice site! I have a suggestion for a prompt that I could never get to work properly. It's been a while since I tried it, and the models have probably improved enough that it should be possible now.
I was surprised at how badly the models performed. It's a fairly iconic scene, and there's more than enough training data.Making an accurate flail (stick - chain - ball) is a fun sport.. weird things tend to happen.
Wondering if you could add “Flux 1.1 Pro Ultra” to the site? It’s supposed to be the best among the Flux family of models, and far better than Flux Dev (3rd among your current candidates) at prompt adherence.
Adding it would also provide a fair assessment for a leading open source model.
The site is a great idea and features very interesting prompts. :)
Looks good! Would be great to see Adobe Firefly in your evaluation as well.
please add! cool site thanks :)