> your python code rapidly resembles a lower level language and not just scripting

I thought the point of numeric processing frameworks&languages in general is that if you can express things as common math equations, then geniuses will go in and implement the hyper-optimal solutions for you because the'yre extremely common. If anything, it should resemble scripting even more, because you want to match the structured way as much as possible, so the 'compiler' (or in this case backend C libraries) can do the lifting for you.

Yeah, that's not reality. You often hear people say that neural nets are just linear algebra. That isn't really true anymore if you're going for peak performance, there's also a lot of data handling (i.e. tensor movement, kv caching) and distributed communication that needs to happen too.

Ah, I see. My foray into ML in recent times mostly concentrated around theoretical models (transformers obviously, but also Mamba, SSM's, etc.) & kernel generation frameworks (such as ThunderKittens and Triton). Not really around the system architecture level.

I've implemented KV caching in C++ and seen it implemented in Python, I see your point.

No large scale training & inference either, that's cool, if the model can't even fit onto a single GPU. I can see how memory communication can become a significant issue, since you'd have to manage that through python if you're managing python kernels. (Though you technically could just throw all the responsibility down to the lower levels yet again... not a good idea & polluting responsibilities though)