It's crazy to think there was a fleeting sliver of time during which Midjourney felt like the pinnacle of image generation.

What ever happend to midjourney?

No external funding raised. They're not on the VC path, so no need to chase insane growth. They still have around 500M USD in ARR.

In my (very personal) opinion, they're part of a very small group of organizations that sell inference under a sane and successful business model.

Not on the VC path. Not even on the max-profit path. Just on the "Have fun doing cool research" path.

I was a mod on MJ for its first few years and got to know MJ's founder through discussions there. He already had "enough" money for himself from his prior sale of Leap Motion to do whatever he wanted. And, he decided what he wanted was to do cool research with fun people. So, he started MJ. Now he has far more money than before and what he wants to do with it is to have more fun doing more cool research.

Aesthetically, still unmatched

Apparently image models have to choose between aesthetics and photorealism. Many aren't good at either.

They're working on a few really lofty ideas:

1. real time world models for the "holodeck". It has to be fast, high quality, and inexpensive for lots of users. They started on this two years ago before "world model" hype was even a thing.

2. some kind of hardware to support this.

David Holz talks about this on Twitter occasionally.

Midjourney still has incredible revenue. It's still the best looking image model, even if it's hard to prompt, can't edit, and has artifacting. Every generation looks like it came out of a magazine, which is something the other leading commercial models lack.

They have image and video models that are nowhere near SOTA on prompt adherence or image editing but pretty good on the artistic side. They lean in on features like reference images so objects or characters have a consistent look, biasing the model towards your style preferences, or using moodboards to generate a consistent style

A lot of people started realizing that it didn’t really matter how pretty the resulting image was if it completely failed to adhere to the prompt.

Even something like Flux.1 Dev which can be run entirely locally and was released back in August of 2024 has significantly better prompt understanding.

Yeah, though I there is the same issue the other way round: Great prompt understanding doesn't matter much when the result has an awfully ugly AI fake look to it.

That's definitely true, and the medium also really makes a big difference as well (photorealism, digital painting, watercolor, etc.).

Though in some cases, it is a bit easier to fix visual artifacts (using second-pass refiners, Img2Img, ultimate upscale, stylistic LoRAs, etc.) than a fundamental coherency problem.

I was disappointed when Imagen 4 (and therefore also Nano Banana Pro, which clearly uses Imagen 4 internally to some degree) had a significantly stronger tendency to drift from photorealism to AI fake aesthetics than Imagen 3. This suggests there is a tradeoff between prompt following and avoiding slop style. Perhaps this is also part of the reason why Midjourney isn't good at prompt following.

Not much, while everything happened at OpenAI/Google/Chinese companies. And that's the problem.

How is it a problem? There simply doesn't seem to be a moat or secret sauce. Who cares which of these models is SOTA? In two months there will be a new model.

There seems to be a moat like infrastructure/gpus and talent. The best models right now come from companies with considerable resources/funding.

Right, but that's a short term moat. If they pause on their incredible levels of spending for even 6 months, someone else will take over having spent only a tiny fraction of what they did. They might get taken over anyway.

> someone else will take over having spent only a tiny fraction of what they did

How. By magic? You fell for 'Deepseek V3 is as good as SOTA'?

By reverse engineering, sheer stupidity from the competition, corporate espionage, ‘stealing’ engineers and sometimes a stroke of genius, the same as it’s always been

They still have a niche. Their style references feature is their key differentiator now, but I find I can usually just drop some images of a MJ style into Gemini and get it to give me a text prompt that works just as well as MJ srefs.

Isn’t it still? Antidotally, I work with lots of creators who still prefer it because of its subjective qualities.

The pace of commoditization in image generation is wild. Every 3-4 months the SOTA shifts, and last quarter's breakthrough becomes a commodity API.

What's interesting is that the bottleneck is no longer the model — it's the person directing it. Knowing what to ask for and recognizing when the output is good enough matters more than which model you use. Same pattern we're seeing in code generation.

PLEASE STOP POSTING AI GENERATED COMMENTS

SOTA shifts, yes. But the average person doing the work has been very happy with SDXL based models. And that was released two years ago.

The fight right now outside of API SOTA is who will replace SDXL to be the “community preference”

It’s now a three way between Flux2 Klein, Z-Image, and now Qwen2.

I'm happy the models are becoming commodity, but we still have a long way to go.

I want the ability to lean into any image and tweak it like clay.

I've been building open source software to orchestrate the frontier editing models (skip to halfway down), but it would be nice if the models were built around the software manipulation workflows:

https://getartcraft.com/news/world-models-for-film