That whole thing would get you 1000 variants of existing art. But if you asked a thousand different designers to do a cover for the same book...
That whole thing would get you 1000 variants of existing art. But if you asked a thousand different designers to do a cover for the same book...
> 1000 variants of existing art.
This is very naive. I can almost guarantee that some combinations of 20 * 50 features will hit on something that has never been written before in that specific combination. And if that's still not enough, increase the number of features. Add more randomness, add more steering, add random steering in random chapters, change it up, and so on.
I'm an art director. Finding a sequence that hasn't been hit in that specific combination is not sufficient to justify paying someone $150 an hour to go be creative.
Sure, just like 1000 monkeys with typewriters will write 1000 technically unique books - but they are all still filled with the same garbage.
>will hit on something that has never been written before in that specific combination
That's a very low bar. The skill of an artist is not in writing something that "has never been written before in that specific combination", it's in writing something that's unique or better that what was there, even if it has been written before in that specific combination.
> Add more randomness, add more steering, add random steering in random chapters, change it up, and so on.
That doesn't work for AI models. The whole training process depends on the basic principle that if you take the average of 100, in this case book cover designs, that the average is less like randomness than any individual cover you've used to make your average.
So the output will, by necessity, be closer to the average.
The human learning algorithm is much, much more data efficient than models. A absolute top human expert will have read/seen/heard/talked/... about 160 million "tokens" (that's about 2000 books). Frankly, the nerve inputs of all experiences of an entire human life, from baby to rewriting relativity theory, are only a couple dozen gigabytes.
Qwen 3.6 27B has been trained (as in seen ~10 to ~50 times) 8 trillion tokens, or to put it another way: for every second you will have spent "gathering life experiences" (ie. your whole life) on your deathbed Qwen 3.6 27B has spend about 50.000 seconds learning. And really that figure should be multiplied by the 10 or 50 training iterations.
Add another 3 or so orders of magnitude and you've got ChatGPT. By this measure, the human brains outperforms ridiculously overspecced ML models (because that's what ChatGPT and the like are) in efficiency a factor of by 5 million or more. This is the reason humans are still faster than ML models.
As for human training iterations: we can be simple: it's 1. In fact, it's impossible to make it even 2. Of course, when it comes to human performance: we are a better but not fundamentally different version of genetic algorithms. Do most humans perform? The honest answer is no. 1 in 1000, and that's very generous, improves SOTA. You absolutely need the 1000 failures though, as anyone whose tried a PhD (or even just design a large program) knows.
So we are very far away from allowing AI models to do what humans can do: take one example and produce, from one example, a better output. And there will always be much more variation in that approach. But ... most human attempts to do something are total crap. Most AI attempts to do something will succeed, but they'll be comparatively be bland, tasteless, "without soul", ...
And this is ignoring the problem that AI also has a massive limitation (that can't be solved, no matter how many nvidia cards you have) in that it trains against historical data. And counterfactuals don't work. What would have happened had Shakespeare decided Macbeth's wife was a force for good? Would the king still get murdered? Would it still be a great story? You can't work with counterfactuals.
> That doesn't work for AI models.
Of course it does. I know it does because I've been using variations of this workflow since gpt3.0. In fact it's the only way it can work, since by design LLMs work from left to right. You can't expect it to produce original stuff if you don't give it the anchors for what original means. It'd be like going to a new bar every night and asking for a "beer that you haven't had before". There's no information to work on there.
What image generation models cannot replicate is the personal experience of the people who make art.
I'll give you an example. One of the most talented designers I employ is a nature lover and a bird-watcher. She has a unique mental profile, as well, in that she's synaesthetic between colors, letters and shapes. In other words, she has a unique neurological structure, coupled with high artistic talent, and an interest in a very particular realm of science.
What makes her design worth $150/hr is not just that her execution is often flawless. It's that you would not, and could not, think of a prompt which would make an AI model produce a new piece akin to anything she would think of in her process of thinking about what to draw. Could you have it replicate something she did? Obviously. But that means what you're doing is in the long tail, and in terms of quality and originality, is by definition somewhere in the mediocre.
And that's probably fine, for whatever you're doing. But an AI with any kind of prompt would not come up with a Studio Ghibli clone, if Studio Ghibli hadn't existed.
So you shouldn't imagine that you are actually getting any original output out of an LLM, regardless of how cleverly you design your prompts. But moreover, don't flatter yourself to think that you have the ideas to feed to a prompt which would generate truly original content and break free of the shackles imposed by its training. That is an illusion. Very few people have the propensity for generating new visual ideas, and that's why they're still in high demand. But their originality stems from their unique and impossible to replicate experience as individuals who have their own visual/mental map of the world.
The point was to take a random combination of story elements. Pick one each {King,dad,CEO} {betrays,kills,loves} {his enemy,the king,a foreign prime minister} and feed to an LLM.
The output will not be an intricate well designed epic storyline, but a cookie-cutter boring snoozefest.
BUT you can give that to a bunch of humans, who "insert their life experience" (ie. parts of their training data, translated to LLM terms) and sometimes out comes Game of Thrones, Star Wars, ...