A lot of people started realizing that it didn’t really matter how pretty the resulting image was if it completely failed to adhere to the prompt.

Even something like Flux.1 Dev which can be run entirely locally and was released back in August of 2024 has significantly better prompt understanding.

Yeah, though I there is the same issue the other way round: Great prompt understanding doesn't matter much when the result has an awfully ugly AI fake look to it.

That's definitely true, and the medium also really makes a big difference as well (photorealism, digital painting, watercolor, etc.).

Though in some cases, it is a bit easier to fix visual artifacts (using second-pass refiners, Img2Img, ultimate upscale, stylistic LoRAs, etc.) than a fundamental coherency problem.

I was disappointed when Imagen 4 (and therefore also Nano Banana Pro, which clearly uses Imagen 4 internally to some degree) had a significantly stronger tendency to drift from photorealism to AI fake aesthetics than Imagen 3. This suggests there is a tradeoff between prompt following and avoiding slop style. Perhaps this is also part of the reason why Midjourney isn't good at prompt following.