Last night I read an article about eliminating lag in emulators. It's done with a similar concept. Basically, for each frame, the emulator calculates the state for different button combinations. Then, based on the button you actually push, the state shown moves to the precalculated one.

Not 100% familiar with exactly what this is but I am familiar with run-ahead, which is basically also the same idea as GGPO, but for eliminating lag:

- Run the emulator one frame normally, using real polled input. Somehow snapshot the state.

- Then, run the emulator n frames (usually just one) with the same input. Present the video + audio from the last frame.

- Synchronize. (Some emulators can get very fancy; instead of just waiting for vsync, they'll also delay until the end of the window minus estimated processing time, to poll input at the last possible moment.)

- Roll back to the saved snapshot. (I believe you can also optimize if you know the inputs really didn't change, avoiding a lot of rollbacks at the cost of less predictable frame times.)

The main reason this is even a good idea is because most games will have some of their own processing latency by design, so jumping a frame or two ahead usually doesn't have any noticeable side-effects. This is a pretty cool idea since obviously modern computers with LCD screens have a lot more latency basically everywhere versus older simpler machines connected to CRTs.

Unfortunately, this sort of approach only works when your emulator's state is small enough and fast enough to restore.

I actually have been dying to experiment with designing an emulator to have fast incremental snapshots from the ground up to see if you could manage to make this feasible for more modern consoles. You could, for example, track dirty memory pages with userfaultfd/MEM_WRITE_WATCH, and design structures like JIT caches to be able to handle rewinding without having to drop the entire cache. I'm actually not sure that all emulators clear their caches upon loading state, but you know, more generally, I would like to know how fast and small you could get save states to be if you were designing for that from the ground up.

What do you mean by JIT cache in this case -- like an inline data cache, or like a cache for the jitted binary code?

The latter, though I'm just using it as an example really. I think for most emulator core designs today you wind up dumping a lot of ephemeral state when you rewind or load state (and sometimes even saving state has some overhead due to synchronization.)

i think this is kind of the inverse of how some emulators that add remote play deal with network lag... they will predict what control the opponent will give for the current frame and display it, but then when they actually receive the real opponent control some number of frames later, they will go back to that frame and re-calculate based on knowing the actual control input for that frame, and then catch up to the current frame. That way the game isn't waiting for remote inputs to refresh the screen, but can then reconcile once all information is known.

Interesting; which emulator(s) was this for? I have to imagine this strategy is mostly effective below a certain total complexity level -- however you want to define that -- and the more modern you get, the more you're lucky to even render a frame at all.

Sounds a lot like speculative execution for hardware.

Was just thinking the same, this is effectively branch prediction/precalculation.

Umm, that's brilliant. Is this technique employed in all modern emulators or a recent thing?