Yes, this is the reason I've completely stopped releasing any open-source projects. I'm discovering that newer models are somewhat capable of reverse-engineering even compiled WebAssembly, etc. too, so I can feel a sort of "dark forest theory" taking hold. Why publish anything - open or closed - to be ripped off at negligible marginal cost?
People are just not realizing this now because it's mostly hobby projects and companies doing it in private, but eventually everyone will realize that LLMs allow almost any software to be reverse engineered for cheap.
See e.g. https://banteg.xyz/posts/crimsonland/ , a single human with the help of LLMs reverse engineered a non-trivial game and rewrote it in another language + graphics lib in 2 weeks.
It’s a real problem. I threw it at an old MUD game just to see how hard it is [0] then used differential testing and LLMs to rewrite it [1]. Just seems to be time and money.
[0] https://reorchestrate.com/posts/your-binary-is-no-longer-saf...
[1] https://reorchestrate.com/posts/your-binary-is-no-longer-saf...
Wow, as a former MajorMUD addict (~30 years ago) that's extremely interesting to see. Especially since MajorMUD is rarely discussed on HN, even in MUD or BBS-related threads.
Did you find it worked reasonably well on any portion of the codebase you could throw at it? For example, if I recall correctly, all of MajorMUD's data file interactions used the embedded Btrieve library which was popular at the time. For that type of specialized low-level library, I'm curious how much effort it would take to get readable code.
I am getting closer and closer to a full verified rewrite in Rust. I have also moved to a much easier sqlite relational structure for the backend.
I actually sidestepped the annoying btrieve problem by exporting the data using a go binary [0] and I write it to a sqlite instance with raw byte arrays (blobs). btreive is weird because it has a dll but also a a service to interact with the files.
P.s. I have spent a lot of hours on this mainly to learn actual LLM capabilities that have improved a huge amount in the last year.
[0] https://github.com/barchart/go-btrieve
Why does it matter if it is 'ripped off' if you released it as open source anyway? I get that you might want to impose a particular licence, but is that the only reason?
Even the most permissive open source licenses such as MIT require attribution. Releasing as open source would therefore benefit the author through publicity. Bein able to say that you're the author of library X, used by megacorp Y with great success, is a good selling point in a job interview.
LLM ripping off open source code removes that.
Yes, that's a good point.
This is pretty much exactly why copyright laws came about in the first place. Why bother creating a book, painting, or other work of art if anyone can trivially copy it and sell it without handing you a dime?
I think refusing to publish open source code right now is the safe bet. I know I won't be publishing anything new until this gets definitively resolved, and will only limit myself to contributing to a handful of existing open source projects.