Thanks for the links.
That first link shows people actively pulling out GPL code in 2023 and marketing around that fact, though. That's not great evidence that they're not doing it now, especially if testing for if GPL code is still in there seems to be as easy as prompting with an incomplete piece of it.
I'd think that companies could amass a collection of all known GPL code and test for it regularly in order to refine their methods for keeping it out.
> (which it could not do if identifying LGPL code was pulled from the codebase)
Are you sure about this? Linking to LGPL code is fine afaik. And why not train on code that linked to universally available libraries that are legal to use? Seems like one might even prefer it.
Seems like this was rejected for size and slop reasons, not licensing. If the submitter of the PR isn't even fixing possibly hallucinated author's names, it's obvious that they didn't really read it. Debugging vibe coded stuff is like finding an indeterminate number of needles in a haystack.