I think actually 4o image generation in ChatGPT is still a tool call with a prompt to an “image_gen” tool, I don’t think the generator receives the full context of the conversation. If you do a ChatGPT data export and inspect the record of a conversation using 4o image gen, you’ll see it’s a tool call with a distinct prompt, much like it was with dalle. And if you pass an image in as context, it’ll pass that to the tool as well.

I imagine this is for anti-jailbreak moderation reasons, which is understandable