This is blowing my mind.
I asked Kimi K2.6 to write a blog post in the style of James Mickens.[0] Then I fed the output to Opus 4.7 and asked it who the likely author was, and it correctly identified it as an imitation of James Mickens[1]:
> Based on the stylistic fingerprints in this text, the most likely author is a pastiche/imitation of the style of several writers fused together, but if forced to identify a single likely author, the strongest candidate is someone writing in the voice of James Mickens
> [...]
> The piece could also be a deliberate imitation/homage to Mickens written by someone else, or AI-generated text trained on his style, since the voice is so distinctive it's frequently parodied.
[0] https://kagi.com/assistant/5bfc5da9-cbfc-4051-8627-d0e9c0615...
[1] https://kagi.com/assistant/fd3eca94-45de-4a53-8604-fcc568dc5...
> it correctly identified it as an imitation of James Mickens
How likely is it that it might take into account that it knows for sure it's not anything from Mickens from the latest training data? I'd be curious if it correctly identified a new piece from him that comes out as from him before it gets trained on it.
This is unlikely. The way model distribution works is that the model retains a lossy representation of James Micken's writing. Very likely, it cannot repeat Micken's writing verbatim. Neither can it reason about the training cutoff in this manner.
It's a lossy representation
I haven't been following it well but isn't part of the NYT lawsuit against OpenAI that it sometimes spits out NYT articles verbatim?
Study: Meta AI model can reproduce almost half of Harry Potter book
https://arstechnica.com/features/2025/06/study-metas-llama-3...
See also GEMA vs. OpenAI.
It is lossy, but it is still enough for verbatim recreations. All of Wikipedia is just 24GB of lossless compressed text and all of JK Rowling's work fits into a few MB. So these things would easily be storable verbatim in trillion parameter models. Reasoning about the training cutoff is also something that the newest models do pretty well, because you can teach them to do so after pre training using e.g. SFT. With tool use it can then even check actual current sources, which may happen without you even knowing in the normal chat apps unless you use a controlled API call.
How do you know, how the model works? If there was an index of all Micken's writings, or even if the model searched the web before feeding the response to you, you wouldn't know by observing from the outside.
i suppose a quick test would be getting the model to write down Micken's essay end to end.
if the original essay was stuffed within the prompt window. the result will be word accurate.
unless this is a model trained specifically on Micken's essay (which claude is not).
that's in the ideal scenario where it's only seen a single copy of it tho
Haven’t there been repeated experiments that show if you jailbreak most frontier models’ harnesses you can get them to output near verbatim copyrighted works?
I swear there was a whole court case about this in the last year.
That's neat, though it impresses me less that the article. Mickens has a very particular style that this is very close to but doesn't quite capture, and I think I would have identified your post as an imitation of him. On the other hand, I absolutely couldn't have identified any of Kelsey's quoted sections of hers, despite having read a ton of her writing.
It is very close, but what's more interesting to me is that it's actually amusing. I've yet to see an LLM actually be originally funny (entirely possible I've missed the crossing of that line) and the opening lines put a wry grin on my face.
FYI the first link, I copy-pasted the first few paragraphs into pangram and it correctly identifies as AI written, https://www.pangram.com/history/790fc2b8-6348-47fa-ad3e-8bae...
what does it say when you feed it a real Mickens article? (a recent one not in the training set)
i wouldn't be too impressed at n of 1
He hasn't published anything recently, so I can't test with Mickens, but I tested with my own writing[0], and Opus got it right.
[0] https://news.ycombinator.com/item?id=47970008
This is much less impressive considering how chinese models are usually copies of american models.