I found the horse revenge-porn image at the end quite disturbing.

It's the year of the horse in their zodiac. The (translated) prompt is wild:

""" A desolate grassland stretches into the distance, its ground dry and cracked. Fine dust is kicked up by vigorous activity, forming a faint grayish-brown mist in the low sky. Mid-ground, eye-level composition: A muscular, robust adult brown horse stands proudly, its forelegs heavily pressing between the shoulder blades and spine of a reclining man. Its hind legs are taut, its neck held high, its mane flying against the wind, its nostrils flared, and its eyes sharp and focused, exuding a primal sense of power. The subdued man is a white male, 30-40 years old, his face covered in dust and sweat, his short, messy dark brown hair plastered to his forehead, his thick beard slightly damp; he wears a badly worn, grey-green medieval-style robe, the fabric torn and stained with mud in several places, a thick hemp rope tied around his waist, and scratched ankle-high leather boots; his body is in a push-up position—his palms are pressed hard against the cracked, dry earth, his knuckles white, the veins in his arms bulging, his legs stretched straight back and taut, his toes digging into the ground, his entire torso trembling slightly from the weight. The background is a range of undulating grey-blue mountains, their outlines stark, their peaks hidden beneath a low-hanging, leaden-grey, cloudy sky. The thick clouds diffuse a soft, diffused light, which pours down naturally from the left front at a 45-degree angle, casting clear and voluminous shadows on the horse's belly, the back of the man's hands, and the cracked ground. The overall color scheme is strictly controlled within the earth tones: the horsehair is warm brown, the robe is a gradient of gray-green-brown, the soil is a mixture of ochre, dry yellow earth, and charcoal gray, the dust is light brownish-gray, and the sky is a transition from matte lead gray to cool gray with a faint glow at the bottom of the clouds. The image has a realistic, high-definition photographic quality, with extremely fine textures—you can see the sweat on the horse's neck, the wear and tear on the robe's warp and weft threads, the skin pores and stubble, the edges of the cracked soil, and the dust particles. The atmosphere is tense, primitive, and full of suffocating tension from a struggle of biological forces. """

I think they call it "horse riding a human" which could have taken two very different directions, and the direction the model seems to have taken was the least worst of the two.

At first I thought it's a clever prompt because you see which direction the model takes it, and whether it "corrects" it to the more common "human riding a horse" similar to the full wine glass test.

But if you translate the actual prompt the term riding doesn't even appear. The prompt describes the exact thing you see in excruciating detail.

"... A muscular, robust adult brown horse standing proudly, its forelegs heavily pressing between the shoulder blades and spine of a reclining man ... and its eyes sharp and focused, exuding a primal sense of power. The subdued man is a white male, 30-40 years old, his face covered in dust and sweat ... his body is in a push-up position—his palms are pressed hard against the cracked, dry earth, his knuckles white, the veins in his arms bulging, his legs stretched straight back and taut, his toes digging into the ground, his entire torso trembling slightly from the weight ..."

> But if you translate the actual prompt the term riding doesn't even appear. The prompt describes the exact thing you see in excruciating detail.

Yeah, as they go through their workflow earlier in the blog post, that prompt they share there seems to be generated by a different input, then that prompt is passed to the actual model. So the workflow is something like "User prompt input -> Expand input with LLMs -> Send expanded prompt to image model".

So I think "human riding a horse" is the user prompt, which gets expanded to what they share in the post, which is what the model actually uses. This is also how they've presented all their previous image models, by passing user input through a LLM for "expansion" first.

Seems poorly thought out not to make it 100% clear what the actual humanly-written prompt is though, not sure why they wouldn't share that upfront.

Is it related to "Mr Hands" ?

Wont someone think of the horses.