> But if you translate the actual prompt the term riding doesn't even appear. The prompt describes the exact thing you see in excruciating detail.
Yeah, as they go through their workflow earlier in the blog post, that prompt they share there seems to be generated by a different input, then that prompt is passed to the actual model. So the workflow is something like "User prompt input -> Expand input with LLMs -> Send expanded prompt to image model".
So I think "human riding a horse" is the user prompt, which gets expanded to what they share in the post, which is what the model actually uses. This is also how they've presented all their previous image models, by passing user input through a LLM for "expansion" first.
Seems poorly thought out not to make it 100% clear what the actual humanly-written prompt is though, not sure why they wouldn't share that upfront.