I've been writing Python professionally for a couple of decades, and there've only been 2-3 times where its performance actually mattered. When writing a Flask API, the timing usually looks like: process the request for .1ms, make a DB call for 300ms, generate a response for .1ms. Or writing some data science stuff, it might be like: load data from disk or network for 6 seconds, run Numpy on it for 3 hours, write it back out for 3 seconds.

You could rewrite that in Rust and it wouldn't be any faster. In fact, a huge chunk of the common CPU-expensive stuff is already a thin wrapper around C or Rust, etc. Yeah, it'd be really cool if Python itself were faster. I'd enjoy that! It'd be nice to unlock even more things that were practical to run directly in Python code instead of swapping in a native code backend to do the heavy lifting! And yet, in practice, its speed has almost never been an issue for me or my employers.

BTW, I usually do the Advent of Code in Python. Sometimes I've rewritten my solution in Rust or whatever just for comparison's sake. In almost all cases, choice of algorithm is vastly more important than choice of language, where you might have:

* Naive Python algorithm: 43 quadrillion years

* Optimal Python algorithm: 8 seconds

* Rust equivalent: 2 seconds

Faster's better, but the code pattern is a lot more important than the specific implementation.

> Or writing some data science stuff, it might be like: load data from disk or network for 6 seconds, run Numpy on it for 3 hours, write it back out for 3 seconds.

> You could rewrite that in Rust and it wouldn't be any faster.

I was asked to rewrite some NumPy image processing in C++, because NumPy worked fine for 1024px test images but balked when given 40 Mpx photos.

I cut the runtime by an order of magnitude for those large images, even before I added a bit of SIMD (just to handle one RGBX-float pixel at a time, nothing even remotely fancy).

The “NumPy has uber fast kernels that you can't beat” mentality leads people to use algorithms that do N passes over N intermediate buffers, that can all easily be replaced by a single C/C++/Rust (even Go!) loop over pixels.

Also reinforced by “you can never loop over pixels in Python - that's horribly slow!”

Same with opencv and even sometimes optimized matrix libraries in pure C++. These are all highly optimized. But often when you want to achieve something you have to chain stuff which quickly eats up a lot of cycles, just by copying stuff around and having multiple passes that the compiler is unable to fuse. You can often pretty easily beat that even if you are not an optimization god by manual loop fusion.

Fused expressions are possible using other libraries (numexpr is pretty good), but I agree that there's a reluctance to use things outside of NumPy.

Personally though I find it easier to just drop into C extensions at the point that NumPy becomes a limiting factor. They're so easy to do and it lets me keep the Python usability.

That's because you're doing web stuff. (I/O limited). So much of our computing experience has been degraded due to this mindset applied more broadly. Despite a steady improvement in hardware, my computing experiences have been stagnating and degraded in terms of latency, responsiveness etc.

I'm not going to even go into the comp chem simulations I've been running, or that about 1/3 the stuff I do is embedded.

I do still use python for web dev, partly because as you say, it's not CPU-bound, and partly because Python's Django framework is amazing. But I have switched to rust for everything else.

As a java backend dev mainly working on web services, I wanted to like python, but I have found it really hard to work on a large python project because the auto complete just does not work as well as something like java.

Maybe it is just due to not being as familiar with how to properly setup a python project, but every time I have had to do something in a django or fast api project it is a mess of missing types.

How do you handle that with modern python? Or is it just a limitation of the language itself?

That's 100% an IDE thing. I use Zed (or Emacs or anything else supporting an LSP) and autocomplete is fast and accurate.

Pycharm has been fine. Just disable the AI stuff and you get accurate completion. It even has completion for Django ORM stuff, which is heavily dynamic.

I won’t completely argue against that, and I’ve also adopted Rust for smaller or faster work. Still, I contend that a freaking enormous portion of computing workloads are IO bound to the point that even Python’s speed is Good Enough in an Amdahl’s Law kind of way.

I hear this a lot, but can you really say that you're consistently saturating a 1Gbps line for netcode or 6+ Gbps nvme for disk data? In my experience this doesn't really happen with code that isn't intentionally designed to minimize unnecessary work.

A lot of slow parsing tends to get grouped in with io, and this is where python can be most limiting.

I don't personally use Python directly for super IO intensive work. In my common use cases, that's nearly always waiting for a database to return or for a remote network API to respond. In my own work, I'm saturating neither disk nor network. My code often finds itself waiting for some other process to do that stuff on its behalf.

It's been said that Python's greatest superpower is that it's the second-best language at the most stuff.

No one's really developed an ecosystem for a language that's more performant that can match it, and that's all it needs to assert dominance.

I've never understood this. Python cannot be optimized like C, C++ or Rust. It cannot do advanced functional things like OCaml, Haskell or Scala. It cannot run in browsers like TypeScript. It cannot do games programming like C# and it can't do crazy macro stuff like Clojure. I don't think it's even second best at those things.

I'm reading this as, "It cannot do things the best", and that's correct. It can't.

But it can do them well enough, and enough people know it that they can drag a solution across the line in most domains.

> That's because you're doing web stuff.

I guess you didn't notice where he talked about running numpy?

And 300ms for a DB call is slow, in any case. We really shouldn't accept that as normal cost of doing business. 300ms is only acceptable if we are doing scrypt type of things.

> in any case.

In some cases. Are looking up a single indexed row in a small K-V table? Yep, slow. Are you generating reports on the last 6 years of sales, grouped by division within larger companies? That might be pretty fast.

I'm not sure why you'd even generalize that so overly broadly.

To put in perspective, 300ms is about looping over 30GiB data from RAM, loading 800MiB data from SSD, or doing 1TFLOPS on a single core computer.

300ms to generate a report would be able to go through ~100M rows at least (on a single core).

And the implicit assumption that comment I made earlier, of course is not about the 100M rows scan. If there is a confusion, I am sorry.

That's all true, so long as you completely ignore doing any processing on the data, like evaluating the rows and selectively appending some of them into a data structure, then sorting and serializing the results, let alone optimizing the query plan for the state of the system at that moment and deciding whether it makes more sense to hit the indexes or just slurp in the whole table given that N other queries are also executing right now, or mapping a series of IO queries to their exact address in the underlying disks, and performing the parity checks as you read the data off the RAID and combine it into a single, coherent stream of not-block-aligned tuples.

There's a metric boatload of abstractions between sending a UTF-8 query string over the packet-switched network and receiving back a list of results. 300ms suddenly starts looking like a smaller window than it originally appears.

There is nothing for us to take away in this discussion. So let me be the first to tune down: all I want to say is: don't take that 300ms as given, it sits in this uncomfortable region too short to be an async op and too long to be noticeable (anything between 50ms and 2s fits this bill). Most likely the query is doing something suspicious and would benefit the most to take a closer look at.

I was totally with you until that last sentence, then you lost me again.

Saying a DB query is too long by giving an arbitrary number is like saying a rope is too long. That’s solely dependent on what you’re doing with it. It’s literally impossible to say that X is too long unless you know what it’s used for.

[deleted]

Sure then you get a developer who decides to go with Flask for an embedded product and it's an eye watering slog.

People will always make bad decisions. For example, I'd also squint at a developer who wanted to write a new non-performance-critical network service in C. Or a performance-critical one, for that matter, unless there was some overwhelming reason they couldn't use Rust or even C++.

Advent of code is deliberately set up to be doable in Python. You can also imagine a useful problem which Rust takes 2 weeks to do, how long would it take in Python?

And my experience is this: you start using ORMs, and maybe you need to format a large table once in a while. Then your Python just dies. Bonus points if you're using async to service multiple clients with the same interpreter.

And you're now forced to spend time hunting down places for micro-optimizations. Or worse, you end up with a weird mix of Cython and Python that can only be compiled on the developer's machine.

[deleted]

LOL, python is plenty fast if you make sure it calls C or Rust behind the scenes. Typical of 'professional' python people. Something too slow? just drop into C. It surely sounds weird to everyone who complains about Python being slow and the response is on these lines.

But that’s the whole point of it. You have the option to get that speed when it really matters, but can use the easier dynamic features for the very, very many use cases where that’s appropriate.

This is an eternal conversation. Years ago, it was assembler programmers laughing at inefficient C code, and C programmers replying that sometimes they don’t need that level of speed and control.

You are correct. However it took about only about 10 years for C compilers to beat hand assembly (for the average programmer), thus proving the naysayers wrong.

Meanwhile Python is just as slow today as it was 30 years ago (on the same machine).

People really misconstrue the relationship between Python and C/C++ in these discussions.

Those libraries didn't spring out of thin air, nor were they ever existing.

People wanted to write and interface in python badly, that's why you have all these libraries with substantial code in another language yet research and development didn't just shift to that language.

TensorFlow is a C++ library with a python wrapping. Pytorch has supported C++ interface for some time now, yet virtually nobody actually uses tensorflow or pytorch in C++ for ML R&D.

If python was fast enough, most would be fine, probably even happy to ditch the C++ backends and have everything in python, but the reverse isn't true. The C++ interface exists, and no-one is using it. C++ is the replaceable part of this equation. Nobody would really care if Rust was used instead.

Even as a Fortran programmer, the majority of my flops come from BLAS, LAPACK, and those sort of libraries… putting me in the exact same boat as the Python programmers, really. The “professional” programmers in general don’t worry too much about tying their identities to language choices, I think.

This is a very common pattern in high level languages and has been a thing ever since Perl had first come onto the scene. The whole point was that you use more ergonomic, easier to iterate languages like Perl or Python for most of your logic and you drop down into C, C++, Zig, or Rust to write the performance sensitive portions of your code.

When compiled languages became popular again in the 2010s there was a renewed effort into ergonomic compiled languages to buck this trend (Scala, Kotlin, Go, Rust, and Zig all gained their popularity in this timeframe) but there's still a lot of code written with the two language pattern.

And then someone needs to cross FFI border multiple times and gained perf is hurting again.

If what one's doing in scientific computing needs to cross the FFI border multiple times, they're doing it wrong...

This assumes the boundary between Python and the native code is clean and rarely crossed.

Exactly, most Python devs neither need nor care about perf. Most applications don't even need perf, because whether it's .1 second or .001 seconds, the user is not going to notice.

But this current quest to make Python faster is precisely because the sluggishness is noticeable for the task it's being used for most at the moment. That 6 second difference you note between the Optimal Python and the optimal Rust is money on the table if it translates to higher hardware requirements or more server time. When everything is optimal and you could still be 4x faster, that's a tough pill to swallow if it means spending more $$$.

> most Python devs neither need nor care about perf.

You do understand that's a different but equivalent way of saying, "If you care about performance, then Python is not the language for you.", don't you?

Yes, I'm consistent in that. What I don't get is if that's the case, why is there such a focus on improving Python perf? At best they're getting marginal improvements on something that most Python devs claim they don't care about, and which they say is not important for Python as a language due to JIT, C interop, and so on.

I think perhaps their hope is that eventually Python can get to Go-level if not Rust-level performance if they keep up the optimizations. I do personally believe this to be possible. The motivating example is Julia, which is a high level language with low-level language's performance. After arriving there, developers will care.

I agree, I think that's probably the hope. It's interesting you bring up Julia here because I was just reading the post about the 1.12 release and this comment struck me:

https://news.ycombinator.com/item?id=45524485

Particularly this part is relevent to the Python discussion:

  What is Julia's central conceit? It aims to solve "the two language" problem, i.e. the problem where prototyping or rapid development is done in a dynamic and interactive language like Python or MATLAB, and then moved for production to a faster and less flexible language like Rust or C++.

  This is exactly what the speaker in the talk addresses. They are still using Julia for prototyping, but their production use of Julia was replaced with Rust. I've heard several more anecdotal stories of the exact same thing occurring. Here's another high profile instance of Julia not making it to production:

  https://discourse.julialang.org/t/julia-used-to-prototype-wh...

  Julia is failing at its core conceit.
So that's the question I have right now: what is Python supposed to be? Is it supposed to be the glue language that is easy to use and bind together a system made from other languages? Or is it trying to be what Julia is, a solution to the two language problem. Because it's not clear Julia itself has actually solved that.

The reason I bring this up is because there's a lot of "cake having/eating" floating around these types of conversations -- that's it's possible to be all the things, without a healthy discussion of what the tradeoffs are in going that direction, and what that would me mean for the people who are happy with the way things are. These little % gains are all Python is going to achieve without actually asking the developer to sacrifice their development process in some way.

I think Julia has largely not solved it because it is clunky to use for purposes other than scientific computing. Python can't be argued to be that, it's very nice for web development as well as scientific computing, the issue is just that for non-scientific computing use cases the perf. isn't great.

If you think Python is nice for scientific computing, you must have never tried Matlab. Python is pretty clunky in comparison in its syntax for scientific computing.

I used MATLAB for about 5 years, and then Mathematica, before switching to Python. I even had a job offer to work at MathWorks in Cambridge in about 2014!

And you still think Python has superior matrix manipulation syntax? Because that's at the core of scientific computing.

I think the syntax isn’t that important if I’m totally honest! The library support and ecosystem is much more useful to me than it ever was in MATLAB and tbh I use np.einsum for anything tricky and because performance is better anyway.

Never heard of people who didn't hate Matlab.

"Logically equivalent" is a very limited subset of "equivalent (in meaning)". Language is funny like that.