Lots of people argue that AI R&D is currently done in Python because of the benefits of the rich library ecosystem. This makes me realize that's actually a poor reason for everything to be in Python since the actually useful libraries for things like visualization could easily be called from lower level languages if they're off the hot path.

> could easily be called from lower level languages

Could? Yes. Easily? No.

People write their business logic in Python because they don't want to code in those lower-level languages unless they absolutely have to. The article neatly shows the kind of additional coding overhead you'd have to deal with - and you're not getting anything back in return.

Python is successful because it's a high-level language which has the right tooling to create easy-to-use wrappers around low-level high-performance libraries. You get all the benefits of a rich high-level language for the cold path, and you only pay a small penalty over using a low-level language for the hot path.

The problem I see (day to day working on ML framework optimization) is that it's not just a case of python calling lower level compiled code. Pytorch, for example, has a much closer integration of python and the low level functions than that and it does cause performance bottlenecks. So in theory I agree that using high level languages to script calls to low level is a good idea, but in practice that gets abused to put python in the hot path. Perhaps if the lower level language were the bulk of the framework and just called python for helper functions we'd see better performance-aware design from developers.

What's the bottleneck? Is it serializing to/from pyobjects over and over for the mlops? I thought pytorch was pretty good with this: Tensors are views, the computation graph can be executed in parallel, & you're just calling a bunch of fast linear algebra libraries under the hood, etc.

If it avoids excessive copying & supports parallel computation, surely it's fine?

If your model is small enough where the overhead of python would start dominating the execution time, I mean... does performance even matter that much, then? And if it's large enough, surely the things I mentioned outweigh the costs?

Pytorch started off with an eager execution model. This means that for every kernel you call from python, you have to wait for the kernel to finish and then go back to python to launch the next kernel. torch.compile was introduced to avoid this bottleneck.

Ah, I always forget that there's intermediates that aren't just matrix multiplies in ML.

A single python interpreter stack frame into a 10^4 * 10^4 GEMM C BLAS kernel is not a bottleneck, but calling 10^8 python interpreter stack frames for a pointwise addition broadcast op would be a bottleneck.

Does pytorch overload common broadcast operations though? I was under the impression that it did as well. I guess this is what `torch.compile` attempts to solve?

Yep, this is one issue. There are lots of limitations to what you can compile in this way though and your python code rapidly resembles a lower level language and not just scripting. There are also overheads associated with handling distributed collectives from python, multiprocessing for data loader workers in python and also baked in assumptions in the lower level libraries that introduce overhead if you can't go in and fix them yourself (in which case you could be coding in C++ anyway)

> your python code rapidly resembles a lower level language and not just scripting

I thought the point of numeric processing frameworks&languages in general is that if you can express things as common math equations, then geniuses will go in and implement the hyper-optimal solutions for you because the'yre extremely common. If anything, it should resemble scripting even more, because you want to match the structured way as much as possible, so the 'compiler' (or in this case backend C libraries) can do the lifting for you.

Yeah, that's not reality. You often hear people say that neural nets are just linear algebra. That isn't really true anymore if you're going for peak performance, there's also a lot of data handling (i.e. tensor movement, kv caching) and distributed communication that needs to happen too.

Ah, I see. My foray into ML in recent times mostly concentrated around theoretical models (transformers obviously, but also Mamba, SSM's, etc.) & kernel generation frameworks (such as ThunderKittens and Triton). Not really around the system architecture level.

I've implemented KV caching in C++ and seen it implemented in Python, I see your point.

No large scale training & inference either, that's cool, if the model can't even fit onto a single GPU. I can see how memory communication can become a significant issue, since you'd have to manage that through python if you're managing python kernels. (Though you technically could just throw all the responsibility down to the lower levels yet again... not a good idea & polluting responsibilities though)

> but in practice that gets abused to put python in the hot path

But if that's an abuse of the tools (which I agree with) how does that make it the fault of the language rather than the user or package author? Isn't the language with the "rich library ecosystem" the natural place to glue everything together (including performant extensions in other languages) rather than the other way around -- and so in your example, wouldn't the solution just be to address the abuse in pytorch rather than throw away the entire universe within which it's already functionally working?

The problem is that python allows people to be lazy and ignore subtle performance issues. That's much harder in a lower level language. Obviously the tradeoff is that it'd slow down (or completely stop) some developers. I'm really just wondering out loud if the constraints of a lower level language would help people write better code in this case and whether that trade-off would be worth it

FWIW I would be up to write in c or something else, but use python for the packages / network effects.

It's not just about library availability. Python wins because it lets you offload the low-level performance work to people who really know what they’re doing. Libraries like NumPy or PyTorch / Keras wrap highly optimized C/C++ code—so you get near-C/++ performance without having to write or debug C yourself, and without needing a whole computer science degree to do so properly.

It's a mistake to assume C is always faster. If you don’t have a deep understanding of memory layout, compiler flags, vectorization, cache behavior, etc. your hand-written C code can easily be slower than high-level Python using well-optimized libraries. See [1] for a good example of that.

Sure, you could call those same libs from C, but then you're reinventing Python's ecosystem with more effort and more chances to shoot yourself in the foot. Python gives you access to powerful, low-level tools while letting you focus on higher-level problems—in a language that’s vastly easier to learn and use.

That tradeoff isn't just convenience—it's what makes modern AI R&D productive at scale.

[1] https://stackoverflow.com/questions/41365723/why-is-my-pytho...

I feel like you're re-stating the same claim that crote made that there's a clean cut between python and lower level libraries meaning that the user doesn't need to know what is happening at the lower level to achieve good performance. This is not true in many cases if you are aiming to achieve peak performance - which we should be for training and serving AI systems since they are already so resource hungry.

it isnt a claim, its an emperical fact for which ive provided an example. The fact me and another user made similar comments indepently just goes to show how realistic this viewpoint is.

It's not empirical fact, if you look at the code for AI frameworks you will see that this isn't true in practice when you go beyond a single isolated matrix multiplication

ok, have an amateur implement a hand-written Fourier transform in C, and have it beat numpy's implementation. There, now you have two examples, and there are loads more, like image and signal processing in general, handling data frame operations like grouping / joining things, big integer arithmetic / cryptography in general, just plain old sorting, etc.

Amateurs and even some folks who have worked with these things for a while won't beat off the shelf python calls to highly optimized and well constructed libraries.

I think we're talking past each other. I'm not suggesting that most users should be writing individual computational ops themselves in C. They'll certainly be unlikely to match the perf of expert written C that has had time invested in it. The point I'm trying to make is that when use a framework for modern AI you're not just calling an individual op or many individual ops in isolation. It matters how multiple dependent ops are sequenced along with other code, e.g. data loading code (that may be custom application specific, so not available in a library). My argument is that it may be easier to reach peak performance on your hardware if that framework code was all written in a lower level language.

Ah, i see - for production or most real-time systems you'd be right imo but the time taken to complete most tasks in a conventional ai development environment means the overhead from python moving from lib to lib becomes neigible, no?

I think its more than just because of the available libraries. I think that industry has just predominantly preferred Python. Python is a really rich modern language, it might be quirky, but so is every single language you can name. Nothing is quite as quirky as JavaScript though, maybe VB6 but that's mostly dead, though slightly lingering.

Mind you I've programmed in all the mentioned languages. ;)

It's the ease of distribution of packages and big functionality being a pip install away

That's the killer feature. Whatever it is you want to do, there's almost certainly a package for it. The joke is that Python's the second best language for everything. It's not the best for web backends, but it's pretty great. It's not the best for data analysis, but it's pretty great. It's not the best at security tooling, but it's pretty great. And it probably is the best language for doing all three of those things in one project.

Wouldn't it be nice if popular libraries could export to .so files so the best language for a task could use the bits & pieces it needed without a programmer needing to know python (and possibly C)?

Were I to write a scripting language, trivial export to .so files would be a primary design goal.

Unfortunately the calling conventions and memory models are all different, so there's usually hell to pay going between languages. Perl passes arguments on a stack, Lisp often uses tagged integers, Fortran stores matrices in the other order, ... it goes on and on. SWIG (https://swig.org) can help a lot, but it's still a pain.

Exporting to .so (a) makes it non-portable (you suddenly need to ship a whole compatibility matrix of .so files including a Windows DLL or several) and (b) severely constrains the language design. It's very hard to do this without either forcing the developer to do explicit heap management using the caller's heap or very carefully hiding your VM inside the shared object .. which has interesting implications once you have multiple such libraries. Also you don't have a predefined entry point (there's no equivalent of DllMain) so your caller is forced to manage that and any multithreading implications.

It basically forces your language to be very similar to C.

> The joke is that Python's the second best language for everything.

not for everything. For mobile apps is still very poor - even if you plan only for prototyping instead of distribution. Same for frontend and desktop. For desktop you do have pyqt and pyside but I would say experience is not as good - you would still better do at least doing UI in QML) and end user distribution still sux.

I wish python mobile story improve. Python 3.13 try to improve support for android and iOS and beeware also working on it. But right now ecosystem of pip wheels that build for mobile is very minimal.

Hah!

Ruby, Python, and Perl all had similarly good package ecosystems in the late 1990s, and I think any of them could have ended up as the dominant scripting language. Then Google chose Python as its main scripting language, invested hundreds of millions of dollars, and here we are. It's not as suitable as Matlab, R, or Julia for numerical work, but money made it good enough.

(Sort of like how Java and later JavaScript VMs came to dominate: you can always compensate for poor upfront design with enough after-the-fact money.)

I think that gives Google too much credit (blame?). Perl, for example, started to become increasingly painful as the objects users wanted to manipulate outstripped the natural reach of the language (hence the infamous modem noise sigil pile up, @$[0]->\$foo@ etc). It also did not help that the Perl community took a ten year diversion into Perl6/Raku. Circa 2005, Python looked like a fresh drink compared to Perl.

Yep. CPAN was impressive in the late 90s. I loved writing Perl at the time, other than the sigil explosion. The first time I wrote some Python (“OMG, everything is a reference?!”) was just about the last time I ever wrote any new Perl code.

I made that switch before I’d ever heard of Google, or Ruby for that matter. My experience was quite common at the time.

> That's the killer feature. Whatever it is you want to do, there's almost certainly a package for it.

Yes. Because C and C++ are never going to have a comparable package ecosystem, it almost makes sense for people to distribute such library projects as python packages simply because it handles all the packaging.

This is actually rather a reason to avoid Python in my opinion. You don't want pip to pollute your system with untracked files. There are tools like virtualenv to contain your Python dependencies but this isn't by default, and pip is generally rather primitive compared to npm.

Ubuntu complains now if you try to use pip outside a virtual environment… I think things are in a basically ok state as far as that goes.

Arguably it could be a little easier to automatically start up a virtual environment if you call pip outside of one… but, I dunno, default behavior that papers over too many errors is not great. If they don’t get a hard error, confused users might become even more confused when they don’t learn they need to load a virtual environment to get things working.

The industry standard has been Poetry for a good few years now, and UV is the newer exciting tool in this space. Both create universal lockfiles from more loosely specified dependencies in pyproject.toml resulting in reproducible environments across systems, (they create isolated Python environments per project).

pip, pipx, pipenv, conda, setuptools, poetry, uv, pdm, easy_install, venv, virtualenv

I really hope we are at the end game with poetry or uv. I can't take it anymore.

uv to me seems to be the next big one, pycharm already trying to integrate it, but it needs a lot more polish. Once the most used Python tools adopt uv it's pretty much game over. Course I always hope the industry adopts the best tool, but then they adopt the worst possible tools.

I rewrote a simple RAG ingestion pipeline from Python to Go.

It reads from a database. Generates embeddings. Writes it to a vector database.

  - ~10X faster
  - ~10X lower memory usage
The only problem is that you have to spend a lot of time figuring out how to do it.

All instructions on the Internet and even on the vector database documentation are in Python.

If speed and memory use aren't a bottleneck then "a lot of time figuring out how to do it" is probably the biggest cost for the company. Generally these things can be run offline and memory is fairly cheap. You can get a month of a machine with a ton of RAM for the equivalent of one hour of developer time of someone who knows how to do this. That's why Python is so popular.

>I rewrote a simple RAG ingestion pipeline from Python to Go

I also wrote a RAG pipeline in Go, using OpenSearch for hybrid search (full-text + semantic) and the OpenAI API. I reused OpenSearch because our product was already using it for other purposes, and it supports vector search.

For me, the hardest part was figuring out all the additional settings and knobs in OpenSearch to achieve around 90% successful retrieval, as well as determining the right prompt and various settings for the LLM. I've found that these settings can be very sensitive to the type of data you're applying RAG to. I'm not sure if there's a Python library that solves this out of the box without requiring manual tuning too

> I'm not sure if there's a Python library that solves this out of the box without requiring manual tuning too

There are Python libraries that will simplify the task by giving a better structure to your problem. The knobs will be fewer and more high-level.

I have been using Python recently and have found a lot of the data visualization tools seem to be wrappers around other languages (mostly JavaScript), things like, agGrid, Tabulator, Plotly etc.

Sometimes you end up embedding chunks of javascript directly inside your python

For example the docs for Streamlit implementation of AgGrid contain this: https://staggrid-examples.streamlit.app/Advanced_config_and_...

One ... could? But it doesn't seem particularly ergonomic.

Ergonomics isn't the point, performance is.

Nobody has ever, in the history of Python, called the Python C API easy.