Because in the real world, for code where performance is needed, you run the profiler and either find that the time is spent on I/O, or that the time is spent inside native code.

This might have been your experience, but mine has been very different. In my experience a typical python workload is 50% importing python libraries, 45% slow python wrapper logic and 5% fast native code. I spend a lot of time rewriting the python logic in C++, which makes it 100x faster, so the resulting performance approaches "10% fast native logic, 90% useless python imports".

There is more than one PEP related to making imports faster such as PEP 690 or PEP 810. It's definitely a well-known problem. The solution is probably right around the corner.

Imports being slow is annoying, but only matters to short running code.

Many simple scripts at my work that more or less just argparse and fire off an HTTP request spend half a minute importing random stuff because of false deps and uncommon codepaths. For some unit tests it's 45 seconds, substantially longer than the time taken to run the test logic.

In dev cycles most code is short-running.

> Many simple scripts at my work [...] For some unit tests it's 45 seconds

> I spend a lot of time rewriting the python logic in C++, which makes it 100x faster

Nice! Your workplace didn't care to pick a better tool for the job in the past, and it seems to not care what you're doing at present, if you have to spend time rewriting the stuff in C++, instead of picking Nim and calling it a day, in a day.

Even better, in Nim these little CLI tools could use https://github.com/c-blake/cligen and have had terminal colorized, auto-generated help for many years now with much less dev-effort than raw argparse. Start-up time of statically linked Nim programs is like O(100..500 microseconds, just like C programs).

Have you thought about packing that stuff into an executable or precomputing or preloading it? There's techniques for each of those things that help in some scenarios.

If imports are slow, you need to not be writing python in the first place, because you are either on limited hardware or you are writing a very performant app.

I do a bit of performance work and find most often that things are mixed: there’s enough CPU between syscalls that the hardware isn’t being full maximized, but there’s enough I/O the CPUs aren’t pegged either. It is rare that the profiler finds an obvious hotspot that yields an easy win; usually it shows that with heavy refactoring you can make 10% of your load several times faster, and then you’ll need to do the same for the next 10% and so on. That is the more typical real world for me, and in that world Python is really awful when compared to rewrite-it-in-Rust.

This "There are no hot spots, it's just a uniform glowing orange" situation is why Google picked C++ and then later Rust and to some extent why they picked Go too.

I am, indeed, a C++ developer. :-)

When it's a drop-in replacement, as in most of my code (and it's dead simple to try if it runs when you use pypy ./main.py), I wouldn't know why you should run the code 5-50% slower for no reason though

[deleted]

IRL you will have CPU-bottlenecked pure Python code too. But it's not enough to take on the unknown risk of switching to a lesser supported interpreter. Worst case you just put in the effort to convert the hot parts to multiprocessing.

Also, that engineer time you would spend optimizing for performance costs more than just throwing more hardware at it.

For cloud jobs that can be true, but for single threaded dev-in-the-loop work you can't just buy a 100x faster processor than the one on their dev machine, and the latency is expensive workflow friction.

Not if you have certain types of scientific data. You can't rent enough hardware to run the slow code.

That's the thing with single threaded CPU operations, you can't throw more hardware at it

In this situation, "more hardware" would mean throwing a faster CPU at it.

It caps out quickly. If you have a newish Mac, you're already pretty much at the max.