PyPy deserves much more credit (and much wider use) than it gets. The underperformance of the Faster CPython project [0] shows how difficult it is to optimize a Python implementation, and highlights just how impressive PyPy really is.
> The article says "Python has gotten nearly 50% faster in less than four years", but the original goal was a 5x speedup in the same timeframe
IIRC they originally expected the JIT to be the single focus on CPython performance improvement. But then another front was opened to tackle the GIL in parallel[1]. Perhaps the overhead of two major "surgeries" in the CPython codebase at the same time contributed to slower progress than originally predicted.
But what do people actually use Python for the most, at least as far as industry is concerned? Interfacing with those C extensions.
PyPy does have an alternative ABI that integrates with the JIT and also works on CPython, so if people cared that much about those remaining bits of performance, they could support it.
I really wish PSF would adopt PyPy as a separate project. It is so underrated. People still think it supports a subset of Python code and that it is slow with C ffi code
But the latest PyPy supports all of Python 3.12 and it is just as fast with C ffi code as JIT Python code. It is literally magic and if it was more popular Python would not have a reputation for being slow.
PyPy is amazing and it's actually a bit baffling that it's not the default that everyone is using in production. I've had Python jobs go from taking hours to run, down to minutes simply by switching to PyPy.
Yes. I've had a small webapp running under it quite happily (complete overkill, but it's a personal project and I was curious).
Very basic hello world app hosted under gunicorn (just returning the string "hello world", so hopefully this is measuring the framework time).
Siege set to do 10k requests, 25 concurrency, running that twice so that they each have a chance to "warm up", the second round (warmed up) results give me:
Unfortunately it keeps being the black swan in the Python community.
Python is probably the only programming language community that has been so much against JITs, and where folks routinely call C libraries bindings, "Python".
It's not a black swan. The issue is that using Pypy means accepting some potential compatibility hassle, and in return you get a reasonable speedup in your Python code, from glacial to tolerable. But nobody who has accepted glacial speed really needs tolerable speed.
It's like... imagine you ride a bike to most places. But now you want to visit Australia. "No problem, here take this racing bike! It's only a little less comfortable!".
So really it's only of interest to people who have foolishly built their entire business on Python and don't have a choice. The only one I know of is Dropbox. I bet they use Pypy.
By the time they switch ton
pypy they already have too many C Extensions that is not
compatible with pypy at that time and instead of improving pypy they try to develop their own llvm based jit python and they failed doing that. They should had ported those into CFFI or just help pypy improve context support. But NIH much and they built their own pypy alternative for years and failed
Because Pypy wasn't even _mentioned_ in the JIT PEP (https://peps.python.org/pep-0744/), like it's the black sheep the family isn't supposed to talk about.
Because it is one of the most ambitious project in opensource world and very little is known about that. It is neglected by Python Contributor community for unknown reasons ( something political it seems) . It was developed as PHD Research project by really good researchers.
PyPy had written python in Pure python and surpassed performance of Python written in C by 4-20x . They delivered Python with JIT and also Static RPython : which is subset of python which compiles directly to binary.
I had also personally worked together with some of the lead PyPy developers on commercial projects and they are the best developers to work together with.
Back in 2022 it worked fine with literally all modules except some ssh, ssl and C based modules.
With a little bit of tinkering (multiprocessing, choosing the right libraries written strictly in python, PyPy plus a lot of memory) I was able to optimize some workflows going from 24h to just 17 minutes :) Good times...
The "C based modules" bit is the kicker. A significant chunk of Python users essentially use it as a friendly wrapper for more-powerful C/C++ libraries underneath the hood.
They've long since fixed the C based modules interaction, unfortunately a lot of common knowledge is from when it couldn't interact with everything.
If you've written it off on that basis, I'd suggest it's worth giving it another shot at some stage. It might surprise you.
Last I saw there was still a little bit more overhead around the C interface, so hot loops that just call out to a C module in the loop can be just a smidgen slower, but I haven't seen it be appreciably slower in a fair while.
> We have support for c-extension modules (modules written using the C-API), so they run without modifications. This has been a part of PyPy since the 1.4 release, and support is almost complete. CPython extension modules in PyPy are often much slower than in CPython due to the need to emulate refcounting. It is often faster to take out your c-extension and replace it with a pure python or CFFI version that the JIT can optimize.
"""
The extension modules (i.e. modules written in C, in the standard CPython) that are neither mentioned above nor in lib_pypy/ are not available in PyPy.
"""
The lifecycle of generators makes pypy code very verbose without refcounting. I've already been bitten with generator lifecycles and shared resources. PEP533 to fix this was deferred. Probably for the best as it seems a bit heavy-handed.
Yep, I had a script that was doing some dict mapping and re-indexing, wrote the high level code to be as optimal as possible, and switching from cpython to pypy brought the run time from 5 minutes to 15 seconds.
Not a subset. It covers 100% of pure python. CPyExt are working fine , just need optimizations on some parts.
The private CPyEXT calls that some libraries uses as Hacks are only things that PyPy do not support officially (PyO3 Rust-python bindings uses those) .
PyPy deserves much more credit (and much wider use) than it gets. The underperformance of the Faster CPython project [0] shows how difficult it is to optimize a Python implementation, and highlights just how impressive PyPy really is.
[0] The article says "Python has gotten nearly 50% faster in less than four years", but the original goal was a 5x speedup in the same timeframe [https://github.com/markshannon/faster-cpython/blob/master/pl...].
> The article says "Python has gotten nearly 50% faster in less than four years", but the original goal was a 5x speedup in the same timeframe
IIRC they originally expected the JIT to be the single focus on CPython performance improvement. But then another front was opened to tackle the GIL in parallel[1]. Perhaps the overhead of two major "surgeries" in the CPython codebase at the same time contributed to slower progress than originally predicted.
[1] https://peps.python.org/pep-0703/
The main culprit is not wanting to change the C ABI of the VM.
Other equally dynamic languages have long shown the way.
But what do people actually use Python for the most, at least as far as industry is concerned? Interfacing with those C extensions.
PyPy does have an alternative ABI that integrates with the JIT and also works on CPython, so if people cared that much about those remaining bits of performance, they could support it.
That is the sad part of it all.
The culture that sees writing C as Python, and for them to care, Microsoft and Facebook had to step in.
Now with Microsoft out of the loop, lets see how much support the whole CPython JIT project will keep getting.
I really wish PSF would adopt PyPy as a separate project. It is so underrated. People still think it supports a subset of Python code and that it is slow with C ffi code
But the latest PyPy supports all of Python 3.12 and it is just as fast with C ffi code as JIT Python code. It is literally magic and if it was more popular Python would not have a reputation for being slow.
PyPy is amazing and it's actually a bit baffling that it's not the default that everyone is using in production. I've had Python jobs go from taking hours to run, down to minutes simply by switching to PyPy.
Do you happen to know if Flask is supported by any chance?
Yes. I've had a small webapp running under it quite happily (complete overkill, but it's a personal project and I was curious).
Very basic hello world app hosted under gunicorn (just returning the string "hello world", so hopefully this is measuring the framework time). Siege set to do 10k requests, 25 concurrency, running that twice so that they each have a chance to "warm up", the second round (warmed up) results give me:
So it seems like there's definitely things that pypy's JIT can do to speed up the Flask underpinnings.Yes, have been using Flask on PyPy3 for years. I get about a 4x speedup.
I just tested it and it works perfectly.
Unfortunately it keeps being the black swan in the Python community.
Python is probably the only programming language community that has been so much against JITs, and where folks routinely call C libraries bindings, "Python".
It's not a black swan. The issue is that using Pypy means accepting some potential compatibility hassle, and in return you get a reasonable speedup in your Python code, from glacial to tolerable. But nobody who has accepted glacial speed really needs tolerable speed.
It's like... imagine you ride a bike to most places. But now you want to visit Australia. "No problem, here take this racing bike! It's only a little less comfortable!".
So really it's only of interest to people who have foolishly built their entire business on Python and don't have a choice. The only one I know of is Dropbox. I bet they use Pypy.
By the time they switch ton pypy they already have too many C Extensions that is not compatible with pypy at that time and instead of improving pypy they try to develop their own llvm based jit python and they failed doing that. They should had ported those into CFFI or just help pypy improve context support. But NIH much and they built their own pypy alternative for years and failed
I don't get why PyPy and CPython don't simply merge. It will be difficult, organization wise... but not impossible.
When people think of C library wrappers as Python is kind of an hard sell.
HPY is new alternative, it works at same performance with cpyext and the same with pypy
Why do people feel the need to comment this on every single JIT post? Like imagine commenting on every post about Pepsi "Coca-cola exists since 1886".
Because Pypy wasn't even _mentioned_ in the JIT PEP (https://peps.python.org/pep-0744/), like it's the black sheep the family isn't supposed to talk about.
Because as proven multiple times, the problem isn't Python, rather CPython, and many folks keep mixing languages with implementations.
Because it is one of the most ambitious project in opensource world and very little is known about that. It is neglected by Python Contributor community for unknown reasons ( something political it seems) . It was developed as PHD Research project by really good researchers. PyPy had written python in Pure python and surpassed performance of Python written in C by 4-20x . They delivered Python with JIT and also Static RPython : which is subset of python which compiles directly to binary. I had also personally worked together with some of the lead PyPy developers on commercial projects and they are the best developers to work together with.
> PHD
Do you know that it's PhD because the h is part of word philosophy?
Sorry I was on mobile
If memory serves, PyPy supports a subset of Python and focused their optimizations on software transactional memory.
Back in 2022 it worked fine with literally all modules except some ssh, ssl and C based modules.
With a little bit of tinkering (multiprocessing, choosing the right libraries written strictly in python, PyPy plus a lot of memory) I was able to optimize some workflows going from 24h to just 17 minutes :) Good times...
It felt like magic.
The "C based modules" bit is the kicker. A significant chunk of Python users essentially use it as a friendly wrapper for more-powerful C/C++ libraries underneath the hood.
They've long since fixed the C based modules interaction, unfortunately a lot of common knowledge is from when it couldn't interact with everything.
If you've written it off on that basis, I'd suggest it's worth giving it another shot at some stage. It might surprise you.
Last I saw there was still a little bit more overhead around the C interface, so hot loops that just call out to a C module in the loop can be just a smidgen slower, but I haven't seen it be appreciably slower in a fair while.
The FAQ states it is often much slower:
> We have support for c-extension modules (modules written using the C-API), so they run without modifications. This has been a part of PyPy since the 1.4 release, and support is almost complete. CPython extension modules in PyPy are often much slower than in CPython due to the need to emulate refcounting. It is often faster to take out your c-extension and replace it with a pure python or CFFI version that the JIT can optimize.
https://doc.pypy.org/en/latest/faq.html#do-c-extension-modul...
I have seen great success with cffi though.
I see, and it's a pretty short list:
https://doc.pypy.org/en/latest/cpython_differences.html#exte...
""" The extension modules (i.e. modules written in C, in the standard CPython) that are neither mentioned above nor in lib_pypy/ are not available in PyPy. """
The lifecycle of generators makes pypy code very verbose without refcounting. I've already been bitten with generator lifecycles and shared resources. PEP533 to fix this was deferred. Probably for the best as it seems a bit heavy-handed.
Yep, I had a script that was doing some dict mapping and re-indexing, wrote the high level code to be as optimal as possible, and switching from cpython to pypy brought the run time from 5 minutes to 15 seconds.
If pypy worked with Retux the game would get a big boost. Altough the main issue is that it tried to redraw many object at one per frame.
Not a subset. It covers 100% of pure python. CPyExt are working fine , just need optimizations on some parts. The private CPyEXT calls that some libraries uses as Hacks are only things that PyPy do not support officially (PyO3 Rust-python bindings uses those) .