PTX assembly. Deepseek used some of it to do a little bit of work that CUDA didn't have APIs for.

Sadly platform specific