The problem I see (day to day working on ML framework optimization) is that it's not just a case of python calling lower level compiled code. Pytorch, for example, has a much closer integration of python and the low level functions than that and it does cause performance bottlenecks. So in theory I agree that using high level languages to script calls to low level is a good idea, but in practice that gets abused to put python in the hot path. Perhaps if the lower level language were the bulk of the framework and just called python for helper functions we'd see better performance-aware design from developers.

What's the bottleneck? Is it serializing to/from pyobjects over and over for the mlops? I thought pytorch was pretty good with this: Tensors are views, the computation graph can be executed in parallel, & you're just calling a bunch of fast linear algebra libraries under the hood, etc.

If it avoids excessive copying & supports parallel computation, surely it's fine?

If your model is small enough where the overhead of python would start dominating the execution time, I mean... does performance even matter that much, then? And if it's large enough, surely the things I mentioned outweigh the costs?

Pytorch started off with an eager execution model. This means that for every kernel you call from python, you have to wait for the kernel to finish and then go back to python to launch the next kernel. torch.compile was introduced to avoid this bottleneck.

Ah, I always forget that there's intermediates that aren't just matrix multiplies in ML.

A single python interpreter stack frame into a 10^4 * 10^4 GEMM C BLAS kernel is not a bottleneck, but calling 10^8 python interpreter stack frames for a pointwise addition broadcast op would be a bottleneck.

Does pytorch overload common broadcast operations though? I was under the impression that it did as well. I guess this is what `torch.compile` attempts to solve?

Yep, this is one issue. There are lots of limitations to what you can compile in this way though and your python code rapidly resembles a lower level language and not just scripting. There are also overheads associated with handling distributed collectives from python, multiprocessing for data loader workers in python and also baked in assumptions in the lower level libraries that introduce overhead if you can't go in and fix them yourself (in which case you could be coding in C++ anyway)

> your python code rapidly resembles a lower level language and not just scripting

I thought the point of numeric processing frameworks&languages in general is that if you can express things as common math equations, then geniuses will go in and implement the hyper-optimal solutions for you because the'yre extremely common. If anything, it should resemble scripting even more, because you want to match the structured way as much as possible, so the 'compiler' (or in this case backend C libraries) can do the lifting for you.

Yeah, that's not reality. You often hear people say that neural nets are just linear algebra. That isn't really true anymore if you're going for peak performance, there's also a lot of data handling (i.e. tensor movement, kv caching) and distributed communication that needs to happen too.

Ah, I see. My foray into ML in recent times mostly concentrated around theoretical models (transformers obviously, but also Mamba, SSM's, etc.) & kernel generation frameworks (such as ThunderKittens and Triton). Not really around the system architecture level.

I've implemented KV caching in C++ and seen it implemented in Python, I see your point.

No large scale training & inference either, that's cool, if the model can't even fit onto a single GPU. I can see how memory communication can become a significant issue, since you'd have to manage that through python if you're managing python kernels. (Though you technically could just throw all the responsibility down to the lower levels yet again... not a good idea & polluting responsibilities though)

> but in practice that gets abused to put python in the hot path

But if that's an abuse of the tools (which I agree with) how does that make it the fault of the language rather than the user or package author? Isn't the language with the "rich library ecosystem" the natural place to glue everything together (including performant extensions in other languages) rather than the other way around -- and so in your example, wouldn't the solution just be to address the abuse in pytorch rather than throw away the entire universe within which it's already functionally working?

The problem is that python allows people to be lazy and ignore subtle performance issues. That's much harder in a lower level language. Obviously the tradeoff is that it'd slow down (or completely stop) some developers. I'm really just wondering out loud if the constraints of a lower level language would help people write better code in this case and whether that trade-off would be worth it