Hacker News

dvt 14 hours ago [ - ]

This is an amazing test and it's kinda' funny how terrible gpt-2-image is. I'd take "plagiarized" images (e.g. Google search & copy-paste) any day over how awful the OpenAI result is. Doesn't even seem like they have a sanity checker/post-processing "did I follow the instructions correctly?" step, because the digit-style constraint violation should be easily caught. It's also expensive as shit to just get an image that's essentially unusable.

the_arun 12 hours ago [ - ]

This is from Gemini - https://lens.usercontent.google.com/banana?agsi=CmdnbG9iYWw6...

fblp 11 hours ago [ - ]

Did it correctly follow the instructions? Don't know my pokemon well enough.

minimaxir 10 hours ago [ - ]

Essentially yes (bottom got distorted), but Gemini uses Nano Banana Pro or Nano Banana 2 so it's not a surprising result. The image I linked uses the raw API.

thih9 8 hours ago [ - ]

Note that the styles are different; there are two digit images rendered in color.

Color charcoal drawings do exist, but it’s not what’s usually meant by “charcoal drawing”.

anshumankmr 12 hours ago [ - ]

that is interesting cause I feel gpt-image-1 did have that feature.

(source: https://chatgpt.com/share/69e83569-b334-8320-9fbf-01404d18df...)

weird-eye-issue 11 hours ago [ - ]

You are comparing ChatGPT to a raw image model. These are two completely different things. ChatGPT takes your input, modifies the prompt and then passes it to the image model and then will maybe read the image and provide output. The image model like through the API just takes the prompt verbatim and generates an image.

minimaxir 10 hours ago [ - ]

Nano Banana Pro and ChatGPT Images 2.0 also tweak the prompt because they can think.

weird-eye-issue 10 hours ago [ - ]

Yes exactly, "ChatGPT Images 2.0" is in ChatGPT. That is not a model.

hyperadvanced 12 hours ago [ - ]

I wouldn’t say it’s terrible. I wouldn’t say it’s a huge step forward in terms of quality compared to what I’ve seen before from AI