By not doing it. The ideology here is that general purpose computing took numerous wrong turns from the 1950s to the present for the purpose of embedded systems.

I thought this through back when I was doing embedded projects with the AVR-8, namely display controllers for persistence of vision displays. Something like this doesn't have an OS so you don't need to do context switching for the purposes of the OS.

It was practical to write C code for this but I didn't really like it because code like this doesn't need the stack and the affordances that C calling conventions, the data structures needed to display a scene are dynamic with the scope of the scene, you have 32 registers which is a lot, enough that you can allocate 8 for the interrupt handler and have a lot left over for the main loop.

I was wargaming my paths forward if I needed more power: the obvious route which I probably would have taken is the portable C route via ARM or STM32. Yet I liked AVR-8 a lot and also considered the route of going to an FPGA board on which you could instantiate an AVR-8 soft core clocked higher than any real hardware AVR-8 and also put an accelerator behind it.

The FPGA + TTA + co-designed software route came up at this point. Notably any kind of concurrency, parallelism and extra context can be baked into the "hardware". Adding a few registers is much cheaper than adding superscalar features, adding another MOV slot to the instructions then is pretty cheap if you want more parallelism with the caveat that it could be hard to prevent blocking. If the requirements change it's a frickin' FPGA and you can add something to it or take something away.

What would put the whole idea on wheels is a superoptimizing compiler that could design both the CPU and the code that runs on top of it.