Hacker News

My english is pretty bad, its take time to write post in not sort of a 'pigeon English' :)

Regarding other commenters, about LLMs. Sure they're used, nowdays it's stupid to drop the oportunities that LLMS provide. But around 60% of codoe wroten by me, and CPU emulation is wroten by famous Mike Chambers in fake86 emulator. Sometimes, when code looks rought or stupid (why no struct, folks?) -- keep in mind it's target on very limited cpu power and MUCH more limited RAM of RP2040. Also, code mess of .inl.c cause we need maximize functions inlineing, each funciton call on pico took abonrmal amout of CPU cycles.

While RP2350 was some sort of gamechanger, resources is still limited.

JdeBP 2 days ago [ - ]

If it helps, the word is "pidgin".

* https://ru.wiktionary.org/wiki/pidgin#%D0%90%D0%BD%D0%B3%D0%...

xrip 2 days ago [ - ]

[dead]

peterfirefly 2 days ago [ - ]

> My english is pretty bad, its take time to write post in not sort of a 'pigeon English' :)

Why not work on it? Isn't there a fairly substantial return on investment for someone in Russia who does that? This is not meant as a put down or an insult. Just general befuddlement because it seems like a no-brainer to me.

---

So cpu.[ch] are from fake86 -- via faux86 or straight from fake86? What version did you fork? That is something that would be good to put in the readme.

You wrote the files in the drivers/ directory? And LinuxMiniFB.c, WinMiniFB.c, {linux|win}-main.cpp, pico-main.c? And the audio code?

Tell people what you wrote. You get better feedback that way.

> Sometimes, when code looks rought or stupid (why no struct, folks?) -- keep in mind it's target on very limited cpu power and MUCH more limited RAM of RP2040.

This is what Wikipedia says about the RP2040:

    Dual ARM Cortex-M0+ cores (ARMv6-M instruction set), Originally run at 133 MHz,[2] but later certified at 200 MHz[16]
    Each core has an integer divider peripheral and two interpolators
    264 KB SRAM in six independent banks (four 64 KB, two 4 KB)

That is indeed tiny...

It has never been the structs that slowed my code down. structs are free. Passing them into functions with or without pointers is usually free (or cheaper than free!) with modern C compilers -- but that was not the case until 5-10 years ago, I think.

I haven't checked if Mike Chamber's original cpu emulator was "struct free". Maybe it was and you just inherited that "feature" ;)

> Also, code mess of .inl.c cause we need maximize functions inlineing, each funciton call on pico took abonrmal amout of CPU cycles.

Doesn't link-time optimization (LTO) solve that for you? It looks like you use it in CMakeLists.txt:

    add_link_options(-flto -fwhole-program)
    # -frename-registers -fno-tree-vectorize
    add_compile_options(-flto=auto -frename-registers -fomit-frame-pointer -fwhole-program -ffreestanding -ffast-math -ffunction-sections -fdata-sections -fms-extensions -O2)

You could also try 'inline __attribute__((always_inline))'.

How do you profile code on a Raspberry Pi Pico?

Btw, the redirector interface comes in slightly different versions (different struct sizes in different DOS versions). Which DOS versions do your redirector work with?

xrip 2 days ago [ - ]

> Why not work on it? Isn't there a fairly substantial return on investment for someone in Russia who does that? This is not meant as a put down or an insult. Just general befuddlement because it seems like a no-brainer to me. I have no probs reading and listen in english, but writing is bad caused lack of practice.

> So cpu.[ch] are from fake86 -- via faux86 or straight from fake86? What version did you fork? That is something that would be good to put in the readme.

I think it's mixture of all availble version, but base is from https://github.com/lgblgblgb/fake86

RP2040 drivers is mostly by murmulator community, and initial code by from https://github.com/AlexEkb4ever the creator of murmulator devboard hardware. They're also rewriten and improved by my, but initial from Alex

My code or often deep rewrite of others code is in ./src/ except emu8950 and cpu.(c|h). MiniFB for win32 is striped down version of minifb laying around at github, linuxminifb is my implementation for linux.

About structs and so on, i'm not C coder at all i've started with C from scratch two years ago in my spare time :) So this project, if we trace from inital commit till now is mirror of my growth as C coder :) Also, about Linux/Win32 versions. Win32 version used for overal algo debuggin. Linux version is just 'because why not?'

Typescript is my everyday toy and tool (memory economy? cpu cycles waste... huh!).

Network Redirector uses >=DOS 4 structs, main difference between dos versions is CDS struct and SDA, which is easely can be changed.

There is pre-configured boot FDD0 image for which should be used to achieve best emulator configuration and performance.

peterfirefly a day ago [ - ]

> I have no probs reading and listen in english, but writing is bad caused lack of practice.

It's worse than my German, and I have the excuse of German having lots of cases and inflections (but fewer than Russian). English is like a toddler language in comparison ;)

Best of luck with your English practice.

Put some text about the "source code sources" in your docs. Just take what you wrote in the comments here.

Same goes for the redirector/DOS versions.

Structs are essential to good, clean coding in C (and most other languages). Modern compilers are good at handling structs being passed in and returned from functions, especially for small inlined functions. They are also good at handling pointers to structs being passed into functions, especially for small inlined functions. Play around with gcc/clang + either the -S option (to generate assembly output and then stop) or with objdump or some other assembler. Or use godbolt (Compiler Explorer). You'll be amazed at how efficient the code is.

It's probably a good idea to create a number of short instruction traces, maybe just a thousand instructions each, and figure out a way to build a program that runs them and times each of them. If you can also enable profiling on your Raspberry Pi Pico target so you can see where each trace spent most of its time, it would likely be very useful.

What's your roadmap for the project? Just tinkering? Becoming a better C programmer? Becoming better at embedded programming? Better at ARM32? "Quality of life" improvements that make it easier to use the emulator? Better emulation? Specific games/apps you want to work well?

duskwuff 2 days ago [ - ]

> That is indeed tiny...

Even RP2040 is fairly large as far as microcontrollers go. The widely used STM32F103 is a single 72 MHz Cortex-M3 core with 20 KB SRAM and 64 KB flash, for example. Even smaller parts aren't uncommon.

peterfirefly a day ago [ - ]

I know. But they won't run PC emulators unless the programmer is truly heroic.

I have worked on a slower Cortex-M3 than that. I've also worked on 8051 variants and on ST-62 (called "ST6 architecture" in the link below). My first computer was a ZX81 with a 16KB RAM pack.

https://en.wikipedia.org/wiki/ST6_and_ST7

peterfirefly 10 hours ago [ - ]

Another trick that might work:

https://gcc.gnu.org/onlinedocs/gcc/Global-Register-Variables...

As the docs say, it isn't easy to use it in a safe way but it might speed up your CPU emulator.

This trick used to be common in VMs that were compiler targets. It was also sometimes used by compilers that compiled to C (for portability). There would be some preprocessor magic that enabled it on certain common targets so the C code would compile to faster code.

The implementation strategy here would be something like:

  - keep the x86 registers (8 GPRs + PC + flags) in a struct somewhere when executing outside the CPU code

  - on entry to the CPU core, copy from the struct into global register variables

  - on exit from the CPU core, copy them back

  - everything inside the CPU core then uses the global register variables => no load/store code needed, no code for calculating memory addresses of the x86 registers needed

  - the way operand read/write works in the CPU emulator would have to be changed (probably)

  - the entire structure of the CPU emulator would have to be changed (probably), so each opcode value would have its own C code, and each modrm byte value would also have its own C code

  - you might need to use some compiler magic to force the CPU core to use tail calls and to avoid tail merging

You can always prototype it for a few x86 instructions and see what it would look like. No need to attempt to switch the entire thing over unless you like the prototype.

Computed gotos to help with the tail calls: https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html

Disable tail merging: -fno-tree-tail-merge

If you want to go really crazy, you can have a faster handling of prefixes:

  - instruction interpretation doesn't look for prefixes, it just uses a jump table

  - in case of REP/REPNE/CS/ES/SS, set a flag + use a jump table on the next byte.  The code jumped to is general enough to read the override flags and access memory properly

  - the normal code does not look at the segment override flags and does not have that flexibility

So, two versions of each opcode implementation with two versions of the memory access code: with and without segment override handling.

You can use the same C code for both, you just need to put it in an include file and include it twice (and set some #define's before each include).

There is zero reason to have a slow CPU emulation, especially as you are not doing cycle accuracy.

Again, this is something you can play around with and prototype for a few x86 instructions first.

Even if you don't want to change your emulator in these directions, you could still learn some practical C tricks by writing the prototype code.