Current models are trained on image pastiche and style remixing. But there's no reason you couldn't add an Artistic Director layer which has been trained on emotional and cultural signifiers and to direct the pastiche and remixing.
The practical problem is that models have very limited prompt adherence. The level of detail you can specify in scene design is very crude. So you can get the slop effect where there's a lot of in-fill pastiche detail, but you could never create something like this, where all of the incidental objects are specifically included to enforce the message.
https://en.wikipedia.org/wiki/The_Awakening_Conscience
It's basically the professional version of the "Draw me a pelican on a bicycle" problem.
There are situations where you want that level of creative control, and current image generators don't get close to it.
And without it you can't get to the meta-creativity level where you're creating a new aesthetic that's a cultural landmark - which is what the famous artists did, and still do.
>But there's no reason you couldn't add an Artistic Director layer which has been trained on emotional and cultural signifiers and to direct the pastiche and remixing.
I gave this approach a shot over the first few months of this year[1] (although my director didn't have any custom training). The results were interesting, but I'd not call them "art", since they're low-quality derivative pieces. With reasoning traces enabled, you can see that there's not much intent going on. Though they do attempt to include "incidental objects" to reinforce meaning, like in this jungle scene[2].
[1] https://news.ycombinator.com/item?id=48105385
[2]https://www.liamlaverty.com/paint-by-language-model/inspect/...
Sounds like a skill issue?
Recent image models are advancing rapidly at prompt adherence specifically, and being able to iterate on the same image is propelling them even further. Images 2.0 being the poster child of this "agentic iterative image composition" approach.
Images 2.0 isn't anywhere close to the kind of detail control I'm talking about.
It's the opposite of a skill issue. No image generator is anywhere near the ballpark of pro-level manual Photoshop or Illustrator editing for individual elements in an image.
If you don't understand this, try precisely kerning the text in a generated book cover to handle letter combinations like A and V.
This is one of the big problems with GenAI. You can do new things with it, but it's crude Dunning Kruger good-enough-if-you-don't-ask-for-more creativity.
The pros can see what most people can't, and the flaws and missing features are frustrating and obvious creatively, not just in terms of production values.
I fail to see anything other than a skill issue.
We went from "AI can't generate text that isn't at least 20% typos and it always looks like shit" to "some letter combinations aren't kerned to perfection sometimes and adjusting that with prompts is hard".