really the next big leap is something that gives me more meaningful artistic control over these systems.
It's usually "generate a few, one of them is not terrible, none are exactly what I wanted" then modify the prompt, wait an hour or so ...
The workflow reminds me of programming 30 years ago - you did something, then waited for the compile, see if it worked, tried something else...
All you've got are a few crude tools and a bit of grit and patience.
On the i2v tools I've found that if I modify the input to make the contrast sharper, the shapes more discrete, the object easier to segment, then I get better results. I wonder if there's hacks like that here.
> The workflow reminds me of programming 30 years ago - you did something, then waited for the compile, see if it worked, tried something else...
Well sure... if your compiler was the equivalent of the Infinite Improbability Drive.
I assume you're referring to the classic positive/negative prompts that you had to attach to older SD 1.5 workflows. From the examples in the repo as well as the paper, it seems like AudioX was trained to accept relatively natural english using Qwen2.
no, I'm talking pretty recent stuff. I was dealing with https://huggingface.co/bytedance-research/UNO and https://huggingface.co/HiDream-ai/HiDream-I1-Full earlier today
These are both released this month.
What I'd like to see is some kind of i2i with multiple i input and guidance
So I can roughly sketch, and I don't mean controlnet or anything where I'm dealing with complex 3d characters, but give some kind of destination - and I don't mean the crude stuff that inpainting gives ... none of these things are what I'm talking about.
i'm familiar with the comfyui workflows and stay pretty on top of things. I've used the krita and photoshop plugin and even have built a civitai mcp server for bringing in models. AFAIK nobody else has done this yet.
None of these are hands on in the right way.
Thanks for the links. I've added HiDream-I1 to the prompt adherence comparison chart. From my testing, it has adherence capabilities comparable to Flux.
https://genai-showdown.specr.net
Just my reading, not well separated from my own views but - he's wanting a steam powered paintbrush with thousand buttons that only professionally trained artists are allowed to use and does nothing if held in a hand of an average person. He's done with proxy manipulation through metadata such as fabricated museum captions, negative Danbooru tags, and lines painted over existing works. This is not exactly the problem definition made above nor a fair description to the problem, and it's opposite of "democratization" concepts, but I do believe that's what it is. It's also what Photoshop is, anyway. Untrained users can barely draw a smiley with Photoshop open on a Wacom display.
If I really think about it, it feels just weird to me that fabricated metadata is supposed to be enough to yield an art. Metadata by definition do not contain data. It's management data artificially made to be as disconnected from data as possible. The connections left between the two are basically unsafe side effects.
I wish OpenAI and its followers quit setting bridges ablaze left and right, though I know it's tall order.
here's some surreal art I made today as an example: https://9ol.es/dress.mp4 ... this was uno/wan/kdenlive via pinokio for the first 2 ... there's AI slop and then there's AI as an interesting new medium for exploring the strange ... that's what I want to do more of
The song is https://www.youtube.com/watch?v=6K2U6SuVk5s
Yeah, we all have our own workflows. For me, I usually have a very specific visual concept in mind (which I will block out roughly on graph paper if need be). I can usually get to where I want to go with a combination of inpainting and various types of controlnets.
Like this: (Created by noted thespian Gymnos Henson)
https://specularrealms.com/wp-content/uploads/2024/11/Gorgon...
Do time pass slower on the English Internet?