Another trick that might work:
https://gcc.gnu.org/onlinedocs/gcc/Global-Register-Variables...
As the docs say, it isn't easy to use it in a safe way but it might speed up your CPU emulator.
This trick used to be common in VMs that were compiler targets. It was also sometimes used by compilers that compiled to C (for portability). There would be some preprocessor magic that enabled it on certain common targets so the C code would compile to faster code.
The implementation strategy here would be something like:
- keep the x86 registers (8 GPRs + PC + flags) in a struct somewhere when executing outside the CPU code
- on entry to the CPU core, copy from the struct into global register variables
- on exit from the CPU core, copy them back
- everything inside the CPU core then uses the global register variables => no load/store code needed, no code for calculating memory addresses of the x86 registers needed
- the way operand read/write works in the CPU emulator would have to be changed (probably)
- the entire structure of the CPU emulator would have to be changed (probably), so each opcode value would have its own C code, and each modrm byte value would also have its own C code
- you might need to use some compiler magic to force the CPU core to use tail calls and to avoid tail merging
You can always prototype it for a few x86 instructions and see what it would look like. No need to attempt to switch the entire thing over unless you like the prototype.Computed gotos to help with the tail calls: https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html
Disable tail merging: -fno-tree-tail-merge
If you want to go really crazy, you can have a faster handling of prefixes:
- instruction interpretation doesn't look for prefixes, it just uses a jump table
- in case of REP/REPNE/CS/ES/SS, set a flag + use a jump table on the next byte. The code jumped to is general enough to read the override flags and access memory properly
- the normal code does not look at the segment override flags and does not have that flexibility
So, two versions of each opcode implementation with two versions of the memory access code: with and without segment override handling.You can use the same C code for both, you just need to put it in an include file and include it twice (and set some #define's before each include).
There is zero reason to have a slow CPU emulation, especially as you are not doing cycle accuracy.
Again, this is something you can play around with and prototype for a few x86 instructions first.
Even if you don't want to change your emulator in these directions, you could still learn some practical C tricks by writing the prototype code.