Because there's a lot of work behind printf that the llm doesn't need or care to reproduce

You're not just using the language, but all of the runtime and libraries behind it

Thinking it's more efficient for the llm to reinvent it all is just silly

Right and all of that in the library is built to be legible for the human programmer with constraints involved to fit in within the syntax of the underlying language. Imagine how efficient a function would be that didn't need all of that window dressing? You could "grow" functions out of simulation and bootstrapping, have them be a black box that we harvest output from not much different than say using an organism in a bioreactor to yield some metabolite of interest where we might not know all the relevant pieces of the biochemical pathway but we score putative production mutants based on yield alone.

Indeed. And aside from that, LLMs cannot generalise OOD. There's relatively little training data of complex higher order constructs in straight assembly, compared to say Python code. Plus, the assembly will be target architecture specific.