> But according to the repo, this project also uses both slice and frame multi-threading (as does ffmpeg, with all the tradeoffs).

Oh, I missed that since it doesn't have a separate file. In that case they're likely very similar performance-wise. H.264 wasn't well-designed for CPUs because the arithmetic coding could've been done better, but it's not that challenging these days.

> And SIMD usage is basically table-stakes, and libavcodec uses SIMD all over the place?

SIMD _intrinsics_. libavcodec doesn't write DSP functions in assembly for historical reasons - it's because it's just better! It's faster, just as maintainable, at least as easy to read and write, and not any less portable (since it already isn't portable…). They're basically a poor way to generate the code you want, interfere with other optimizations like autovectorization, and you might as well write your own code generator instead.

The downsides are it's harder to debug and analyzers like ASan don't work.