Is it faster? I’m pretty sure the main goal of this effort is just the “safer” part.

the performance was pretty much identical, however, as an added benefit, Blake2 got quite a bit faster due a combination of 1) our code being slightly more optimized, and 2) python's blake2 integration not actually doing runtime cpu detection, meaning that unless you compiled with -march=native (like, Gentoo), at the time, you were not getting the AVX2 version of Blake2 within Python -- my PR fixed that and added code for CPU autodetection

bear in mind that performance is tricky -- see my comment about NEON https://github.com/python/cpython/pull/119316#issuecomment-2...

The goal is to make things safer, yes, but speed is absolutely a major priority for the project and a requirement for production deployment, because the difference in speed for optimized designs vs naive ones might be an order of magnitude or more. It's quite speedy IME. To be fair to your point, though, it's also a moving target; "which is faster" can change as improvements trickle in. Maybe "shouldn't be much slower" is a better statement, but I was speaking generally :)

(You could also make the argument that if previous implementations that were replaced were insecure that their relative performance is ultimately irrelevant, but I'm ignoring that since it hinges on a known unknown.)

And safer is often slower to avoid timing attacks.

I mean, most if not all of the code they're replacing (e.g. the vendored and vectorized Blake2 code) is also going to be designed and optimized with timing attacks in mind and implemented to avoid them. CVE-2022-37454 was literally found in a component that was optimized and had timing attack resistance in mind (XKCP from the SHA-3 authors). "Code that is already known to be wrong" is not really a meaningful baseline to compare against.