Technically it's running software on the programmable I/O, but that software is just a loop of four outputs that advances when it gets a 1 bit and doesn't advance when it gets a 0 bit. It feels like the hardware that manages the buffer and turns it into a high speed serial stream is doing the more important work here.

And the CPU that's actually deciding on the bits doesn't have to bang them with precise timing, it just has to put them into that buffer.