I am looking at a plain and simple C implemented LLM inference, and/or x86_64 assembly implemented, and/or AMD GPU RDNA assembly.

Anybody?

I heard once that c++ can become assembly at some point if you type the right things in. :)