If nothing else, rosmine's DFT [1], which is what they were working on with this setup, seems like a worthwhile investigation.

While I'm skeptical that there is much of a moat, at least for the large players, it should at least hopefully set rosmine up with for the next job :)

It does seem to fix the current biggest issues with using LLMs for writing at various publishers. If you're The Economist, you have a very specific house style and you have a decent corpus of articles written in that style. At least on my reading of it, rosmine can use DFT to get a model to closely match its outputs, in terms of the language quirks that are generated, to that of the corpus it is fine tuned on. ie it will very much match the house style, particularly as it is used in writing, vs giving a system prompt to an LLM that has some Economist articles in its vast training set, and telling it to write in that style- it will do an ok job, but still exhibit LLM language quirks despite itself. Even if you feed it the specific "style guide" that they give their authors, I dare say the reality of their writing is the best place to learn, and it sounds like DFT can ground the writing of a model in a specific corpus like that.

[1]: https://rosmine.ai/2026/05/18/fixing-llm-writing-with-distri...

Giving an LLM samples and tell it to apply the style in the sample works a lot better than just telling it to copy a style it may have seen, or a list of rules.

They do it well enough that it'd take really good output to beat.

They really don't.

If your goal is to say, write science fiction, their reversion to classic LLM-isms, is really distracting and is what makes people say from a glance that it was written by an LLM. You basically can't use them at the moment in any real "natural" long-form writing. Everyone will call "slop" pretty quickly on the current frontier models.

Rosmin's DFT paper is worth a read.

I have seen examples that shows otherwise, including from a client that tested it extensively by paying people who thought they were paid to help detect AI generated content. They did little more than what I described. It works very well. Some people still insist they are able to tell the difference, but in the tests I saw, people did little better than random chance.

Some of it you could probably tell with statistical analysis, but actualy people are far worse at judging whether content is AI generated than they think they are.

If you need to beat an AI testing tool, you need to do marginally more work than to stop people from recognising it, but not all that much.

The nature of it is that you don't "see" most of the stuff that is well done because few people want to talk about it.