> Formally verified _always_ beats anything else.

Formally verified in an obscure language where it's difficult to find maintainers does not beat something written in a more "popular" language, even if it hasn't been formally verified (yet?).

And these days I would (unfortunately) consider assembly as an "obscure language".

(At any rate, I assume Rust versions of cryptographic primitives will still have some inline assembly to optimize for different platforms, or, at the very least, make use of compile intrinsics, which are safer than assembly, but still not fully safe.)

With crypto, you really want to just write the assembly, due to timing issues that higher level languages simply cannot guarantee.

It's insanely complex, particularly you want _verified_ crypto. Last year (or two years ago?) I had to fix a tiny typo in OpenSSL's ARM assembly for example, it was breaking APT and Postgres left and right, but only got triggered on AWS :D

You don't want to write the whole thing in assembly, just the parts that need to be constant time. Even those are better written as called subroutines called from the main implementation.

Take BLAKE3 as an example. There's asm for the critical bits, but the structural parts that are going to be read most often are written in rust like the reference impl.

Yes, for sure.

I would like a special purpose language to exist precisely for writing crytographic code where you always want the constant time algorithm. In this niche language "We found a 20% speed-up for Blemvich-Smith, oops, it actually isn't constant time on the Arrow Lake micro-code version 18 through 46" wouldn't even get into a nightly let alone be released for use.

It seems that for reasons I don't understand this idea isn't popular and people really like hand rolling assembly.

There's been plenty, like RobustIsoCrypt or FaCT:

https://github.com/PLSysSec/FaCT

They struggle to guarantee constant time for subroutines within a non-constant time application, which is how most people want to use cryptography.

I do think this is pretty much the one use case for a true "portable assembler", where it basically is assembly except the compiler will do the register allocation and instruction selection for you (so you don't have to deal with, e.g., the case that add32 y, x, 0xabcdef isn't an encodable instruction because the immediate is too large).

You can't avoid those with NASA Power of 10 sorts of restrictions?