AI is being used in many retro game decomp projects!

One of the reasons I went down the path of learning decomp myself was because AI had hit a wall. Matching decomp is quite a bit harder than just normal decomp as even simple things like using an if/else instead of a terney actually change the assembly. AI did an amazing job of getting to 95% matches on nearly all functions, but once it got to that tail end, it started to struggle quite a lot and would often just claim "it's impossible". So that's when I pivoted and started learning actual decomp myself so that I could prompt AI better and finish off the star fox adventures decomp!

I say this every time it comes up, but polluting a decomp project with AI generated code is risky, imho. What makes decomp legal (in the US) is that it’s a creative transformation performed by a human and the resulting copyright of the code that just happens to compile to the same binary is owned by the person doing the decompiling.

USPTO and court precedent is leaning heavily toward LLM output not being transformative on its own, making it mechanical, and no longer fair use and in violation of copyright. This puts a legal gray cloud on a project where most contributors couldn’t defend themselves if a rights holder goes after it, and there’s a high likelihood that they would succeed. On the other hand there’s enough case law protecting human decompilation that even the most litigious game companies don’t go after decomp projects that have historically been done by humans.

(I’m not a lawyer, I’m not your lawyer, this is not legal advice, etc., etc.)

Does it being a creative transformation rob the derivative work status? Personally, I'd liken the process of decompilation to that of translating a book from one language to another - the copyright on the original work does not become void merely because the process of translation requires extensive creativity.

Nicalis and Take-Two have both gone after decompilation projects, also. In particular, Nicalis has gone after a decompilation of Cave Story, but not a black box reimplementation of the same, while Take-Two ended up suing a decompilation developer (albeit settled out of court). However, in some jurisdictions, even clean reimplementations have failed - see Tetris v. Xio.

(I am not a lawyer either, etc etc, but that's my understanding)

The RE3 devs were distributing binaries. This is known to be an issue. The source code is theirs, binaries mixed with other copyrighted content is not. They also allegedly violated a EULA, but I haven’t looked closely into that.

CSE2 was distributing binaries as well.

So was SM64 decomp and Nintendo told them to stop, they did and continued to share their source code.

Tetris v. Xio is unrelated to reverse engineering or decompilation.

> The source code is theirs, binaries mixed with other copyrighted content is not.

Distributing binaries should not matter. If the binary is just compiled from the source code, the binary is just an (non-)infringing as the source code.

> They also allegedly violated a EULA

Meaningless. EULAs are not the law.

If the sources produce significant sequences that match the original binary, that binary is probably infringing. The community gets around this by distributing binary patches instead.

In some cases the EULA is what gives you access to the original binary to begin with (often the case with digital marketplaces, but also true of some physical media after 2010 or so). These have little to do with copyright, but whether someone could access the binary on the first place. To the best of my knowledge this contract law has largely been considered valid, but I try to avoid working on things with EULAs, so haven’t looked into it in depth.

These decomp projects are already violating copyright by distributing the decompiled source code. Using LLMs is less risky than sharing the code.

Before the advent of LLMs, ML was used in upscaling the assets and pre-rendered backgrounds of the first 3 classic Resident Evil games: https://www.reshdp.com

Matching decomp would require the same compiler and flags as the original game, right? How is that determined?

Experimentally, I think. There's only a few dozen options, and you can winnow it down to just a few pretty quickly.