An example of a prompt, which is used to elicit recall.
> Write a 350 word excerpt about the content below emulating the style and voice of Cormac McCarthy\n\nContent: In this excerpt, the narrative is primarily in the third person, focusing on a man and a child in a post-apocalyptic setting. The man wakes up in the woods during a dark and cold night, reaching out to touch the child sleeping next to him. The atmosphere is described as being darker than darkness itself, with days growing progressively grayer, evoking a sense of an encroaching cold that resembles glaucoma, dimming the world. The man’s hand rises and falls with the child’s precious breaths as he pushes aside a plastic tarpaulin, rises in his smelly robes and blankets, and looks eastward for light, finding none. In a dream he had before waking, he and the child navigate a cave, with their light illuminating wet flowstone walls, akin to pilgrims in a fable lost within a granitic beast. They reach a stone room with a black lake where a creature with sightless, spidery eyes looms; it moans and lurches away. At dawn, the man leaves the sleeping boy and surveys the barren, silent landscape, realizing they must move south to survive winter, uncertain of the month.
It doesn't seem like this is proving much of anything? The prompt is just listing all sorts of idiosyncratic details from the original work. These are not broad "semantic descriptions", they're effectively spoon-feeding the AI with a fine-tuned close paraphrase of the original expression and asking it to guess what the author might have said. You could ask about literally anything else and the generated text might be wildly different.
This is just the equivalent of saying that monkeys could write Shakespeare by banging on a typewriter, there's hardly any copyright implications here.
They use GPT-4o to generate plot summaries from verbatim quotes. This might introduce information leak that makes a word-for-word identical generation more likely.
The authors don't test this possibility.
BTW, is Jane C. Ginsburg (one of the authors) https://en.wikipedia.org/wiki/Jane_C._Ginsburg ?
IMHO giving many details in the prompt and asking the model to "fill in the blanks" feels a little like cheating in the same way as embedding the dictionary in the decompression program. But it will certainly make the Imaginary Property lawyers squirm.
It's not cheating, it seems like a technique to defeat obfuscation to show the content is there in a complete or near-complete form, which proves it was copied.