I too find it unreadable, I guess that's the downside of working on this stuff every day, you get to really hate seeing it.
It does tell you that if even 95% of HN can't tell, then 99% of the public can't tell. Which is pretty incredible.
I too find it unreadable, I guess that's the downside of working on this stuff every day, you get to really hate seeing it.
It does tell you that if even 95% of HN can't tell, then 99% of the public can't tell. Which is pretty incredible.
Cheers, it's gotta be the "I see this every day for hours" thing - I have a hard time mentioning it because there's a bunch of people who would like to think they have similar experience and yet don't see the same tells. But for real, I've been on these 8+ hours a day for 2 years now.
And it sounds like you have the same surreal experience as me...it's so blindingly. obvious. that the only odd thing is people not mentioning it.
And the tells are so tough, like, I wanted to a bang a drum over and over again 6 weeks ago about the "It's not X-it's Y" thing, I thought it was a GPT-4.1 tell.
Then I found this under-publicized gent doing God's work: ton of benchmarks, one of them being "Not X, but Y" slop and it turned out there was 40+ models ahead of it, including Gemini (expected, crap machine IMHO), and Claude, and I never would have guessed the Claudes. https://x.com/sam_paech/status/1950343925270794323
Can confirm it's "Not just the GPT's - it's all of the frontier models." who are addicted to that one.
IME the only reliable way around it when using an LLM to create blog-like content is to have actual hard lists of slop to rewrite/avoid. This works pretty well if done if correctly. There's actually not that many patterns (not hundreds, more like dozens) so they're pretty enumerable. On the other hand, you and me would still be able to tell if only rewriting those things.
Overall the number one thing is that the writing is "overly slick". I've seen this expressed in tons of ways but I find slickness to be the most apt description. As if it's a pitch, or a TED presentation script, that has been pored over and perfected until every single word is optimized. Very salesy. In a similar vein, in LLM-written text, everything is given similar importance. Everything is crucial, one of the most powerful X, particularly elegant, and so on.
I find Opus to have the lowest slop ratio, which this benchmark kind of confirms [1], but of course its pricing is a barrier.
[1] https://eqbench.com/creative_writing_longform.html