That’s clever — am I right to think it’s the intermediate solution between locks and full STM, implemented at the kernel level, and with zero abstraction cost?

It's in some sense a light form of STM. The key insight behind rseq(2) is that if the data is local to a given CPU the only way to get a race is if the kernel deschedules your program from that CPU at an inopportune time. If your operation can be aborted and restarted and the kernel has a mechanism to notify you when that needs to happen you can dispense with the overhead of "real" synchronization and just use a couple mov instructions to enter and exit the critical section.