I'm cryptographically ignorant. What does this mean for python?

https://en.wikipedia.org/wiki/Formal_verification

Provable correctness is important in many different application contexts, especially cryptography. The benefit for Python (and for any other user of the library in question) is fewer bugs and thus greater security assurance.

If you look at the original bug that spawned this work (https://github.com/python/cpython/issues/99108), the poster reported:

"""As evidenced by the recent SHA3 buffer overflow, cryptographic primitives are tricky to implement correctly. There might be issues with memory management, exceeding lengths, incorrect buffer management, or worse, incorrect implementations in corner cases.

The HACL* project https://github.com/hacl-star/hacl-star provides verified implementations of cryptographic primitives. These implementations are mathematically shown to be:

    memory safe (no buffer overflows, no use-after-free)
    functionally correct (they always compute the right result)
    side-channel resistant (the most egregious variants of side-channels, such as memory and timing leaks, are ruled out by construction)."""

Here is how I read it: CPython has a hashlib module that has long been a wrapper around certain functions exported from OpenSSL, but since a SHA3 buffer overflow was discovered in OpenSSL, the CPython project has now decided to wrap a library called HACL*, which uses formal verification to ensure there are no buffer overflows and other similar bugs.

Extreme ELI5 TL;DR: Your Python programs using the cpython interpreter and its built in cryptographic modules will now be using safer and faster, with no need for you to do anything.

Who uses its built-in cryptographic modules though? I wrote a fair bit of cryptographic code in Python more than five years ago and most people recommended the cryptography package https://pypi.org/project/cryptography/ over whatever that's built-in.

I'm a big fan of pyca/cryptography and I use it for any serious project, but if I just need hashing I tend to use the standard library hashlib - it saves a somewhat heavy dependency, and the API is less verbose.

Also, pyca/cryptography uses OpenSSL. OpenSSL is fine, but has the same "problem" as the previous python stdlib implementation. (Personally I think it's an acceptable risk. If anything, swapping in 15,000 lines of new code is the greater risk, even if it's "verified")

I'm curious why you put "verified" in scare-quotes, and why you think adopting formally verified code is a greater risk.

I don't think formal verification is bad, I just don't think it's a silver bullet. i.e. we should not get complacent and assume that formally verified code is 100% safe.

[deleted]

Is it faster? I’m pretty sure the main goal of this effort is just the “safer” part.

the performance was pretty much identical, however, as an added benefit, Blake2 got quite a bit faster due a combination of 1) our code being slightly more optimized, and 2) python's blake2 integration not actually doing runtime cpu detection, meaning that unless you compiled with -march=native (like, Gentoo), at the time, you were not getting the AVX2 version of Blake2 within Python -- my PR fixed that and added code for CPU autodetection

bear in mind that performance is tricky -- see my comment about NEON https://github.com/python/cpython/pull/119316#issuecomment-2...

The goal is to make things safer, yes, but speed is absolutely a major priority for the project and a requirement for production deployment, because the difference in speed for optimized designs vs naive ones might be an order of magnitude or more. It's quite speedy IME. To be fair to your point, though, it's also a moving target; "which is faster" can change as improvements trickle in. Maybe "shouldn't be much slower" is a better statement, but I was speaking generally :)

(You could also make the argument that if previous implementations that were replaced were insecure that their relative performance is ultimately irrelevant, but I'm ignoring that since it hinges on a known unknown.)

And safer is often slower to avoid timing attacks.

I mean, most if not all of the code they're replacing (e.g. the vendored and vectorized Blake2 code) is also going to be designed and optimized with timing attacks in mind and implemented to avoid them. CVE-2022-37454 was literally found in a component that was optimized and had timing attack resistance in mind (XKCP from the SHA-3 authors). "Code that is already known to be wrong" is not really a meaningful baseline to compare against.