This is an amazing test and it's kinda' funny how terrible gpt-2-image is. I'd take "plagiarized" images (e.g. Google search & copy-paste) any day over how awful the OpenAI result is. Doesn't even seem like they have a sanity checker/post-processing "did I follow the instructions correctly?" step, because the digit-style constraint violation should be easily caught. It's also expensive as shit to just get an image that's essentially unusable.

This is from Gemini - https://lens.usercontent.google.com/banana?agsi=CmdnbG9iYWw6...

Did it correctly follow the instructions? Don't know my pokemon well enough.

Essentially yes (bottom got distorted), but Gemini uses Nano Banana Pro or Nano Banana 2 so it's not a surprising result. The image I linked uses the raw API.

Note that the styles are different; there are two digit images rendered in color.

Color charcoal drawings do exist, but it’s not what’s usually meant by “charcoal drawing”.

that is interesting cause I feel gpt-image-1 did have that feature.

(source: https://chatgpt.com/share/69e83569-b334-8320-9fbf-01404d18df...)

You are comparing ChatGPT to a raw image model. These are two completely different things. ChatGPT takes your input, modifies the prompt and then passes it to the image model and then will maybe read the image and provide output. The image model like through the API just takes the prompt verbatim and generates an image.

Nano Banana Pro and ChatGPT Images 2.0 also tweak the prompt because they can think.

Yes exactly, "ChatGPT Images 2.0" is in ChatGPT. That is not a model.

I wouldn’t say it’s terrible. I wouldn’t say it’s a huge step forward in terms of quality compared to what I’ve seen before from AI