Sorry if this sounds naive, but does it make sense to write a codec library in C/ASM considering how well Rust is progressing, especially when, as the author puts it, AV2 decoding is roughly five times more complex than AV1 decoding?
Sorry if this sounds naive, but does it make sense to write a codec library in C/ASM considering how well Rust is progressing, especially when, as the author puts it, AV2 decoding is roughly five times more complex than AV1 decoding?
The algorithms deployed in these kind of codecs take into account not only human vision and mathematical laws of information, but also nitty-gritty details of how computers work, which are optimally exploited by directly having humans write detailed assembly rather than a compiler make a best guess and effort.
Encoder and decoder writers frequently need extremely fine grain control over SIMD instructions in order to get good performance.
The way they weave these instructions can be very hard to express with a high level language.
Further, there's a ton of work with arrays and importantly parts of arrays. They can, for example, need to extract every other element up to 1/2 the array. Unfortunately, rust has runtime array bounds checks which make writing that sort of code slower. The compiler can elade those checks, but usually only in simple cases.
The authors would be writing a bunch of unsafe rust to get the performance they want and rust makes that more painful on purpose.
I like rust, but C/ASM really is the right choice here. This is one of the few cases where rust's safety is a major detriment.
Performance should not be priority #1. Security should be. Why do we slow down all CPUs to prevent SPECTRE attacks yet continue to write in C? As rav1d shows, the perf loss is far less to migrate from C to Rust than it is to apply SPECTRE mitigations, and adding a sandbox around a memory-unsafe codec is going to be way more expensive again than using Rust code to start.
> Performance should not be priority #1. Security should be.
For a web browser, or a server in a bank, sure. For anything else, questionable.
> adding a sandbox around a memory-unsafe codec is going to be way more expensive
In modern world, overhead of strong sandboxes is surprisingly small. A nuclear but most reliable option is hardware assisted VM. On modern computers with SLAT and virtualized IO the overhead for most use cases is negligible. If you want something lighter weight, can use a multi-user nature of all modern OS kernels and isolate into a separate process with restricted permissions. Sandboxing overhead is approximately zero.
> As rav1d shows
rav1d is not a full rewrite of dav1d to rust. So it really doesn't show that. It's currently C + rust + asm.
I don't think we can say anything about what this does or does not prove about the performance of safe code.
> Performance should not be priority #1. Security should be.
Entirely depends on the application. The reason rust has `unsafe` is because there's some situations where performance needs to preempt potential security problems.
Because it's 5 times more complex, you need to get the maximum performance available. Therefore more ASM than ever.
Rust does not bring more performance. Just more safety.
> Rust does not bring more performance. Just more safety.
Though more safety can in some cases bring a bit more performance. For instance, with Rust you can often avoid "defensive copies" of objects.
The safety can be worth it in certain cases. Like when handling untrusted input. And it's not just Rust: look at WUFFS for example. WUFFS can actually rival handwritten implementations in certain cases.
Are video codecs in the present day able to be sandboxed? In my fantasies at least I’d like the worst a malicious video file can do is cause garbage output or cause the codec to crash.
Forgive the ignorance, I have worked entirely in the abstracted layers of the stack, and mostly web.
but not these cases
It really should be, though: https://en.wikipedia.org/wiki/FORCEDENTRY
I don't see why not. What makes you think this is unique?
The ffmpeg devs have said many times in public that they routinely get speedups of 10x or more over C code. I'm not a reputable source on this myself but I highly recommend looking into their channels, mails, or posts.
https://youtu.be/nepKKz-MzFM&t=7195
If you can stand Lex Friedman for a bit, the VLC authors talk about why you use ASM for a video decoder instead of pure C or rust.
yes it makes sense to use C/ASM here, but if you're curious, there is a rust port of dav1d named rav1d: https://github.com/memorysafety/rav1d
it's not much slower than the original C/ASM implementation (last i checked ~5%?) but that matters here
It's a Rust/ASM port, look there: https://github.com/memorysafety/rav1d/blob/main/src/ext/x86/...
I am not sure if it is that much safer than the C version when raw assembly is still required.
It is much slower than 5%, there were other independent tests that put it around 20%.
there's a rav2d now too fwiw — https://github.com/stukenov/rav2d same playbook: safe Rust + asm kernels via FFI. just shipped 0.1.0.
fyi the Rust port already exists: https://github.com/stukenov/rav2d you keep the hand-written asm via FFI, rest is safe Rust. same trick rav1d uses.
Go ask FFmpeg what they're writing their encoders and decoders in.
That isn’t particularly helpful to someone asking a question in good faith. What others are using doesn’t clarify why they are using it. Plus, FFmpeg is itself a decade older than Rust. The OP is asking about starting a new project today.
> What others are using doesn’t clarify why they are using it.
It does if you ask them, or at least research the topic at hand.
Isn’t that just the same as answering “Google it”, then? We’re on a discussion forum, where matter experts visit, talking about a specific topic. If one can’t ask their questions in this highly relevant situation, where can they? The point of HN is supposed to be gratifying curiosity.
I don't know why you've been down-voted. It definitely isn't an optimal decision. A video codec isn't all assembly. There's plenty of plain unsafe C code. E.g. this is the first random file I clicked. It has a ton of raw C pointer stuff just begging to be exploited.
https://code.videolan.org/videolan/dav2d/-/blob/main/src/dat...
There is a project to write an AV1 decoder in Rust: Rav1d (really stretching the name here).
https://github.com/memorysafety/rav1d
They got within 5% of the performance of dav1d and held a contest to close the gap but I think I read somewhere that this wasn't achieved.
https://www.memorysafety.org/blog/rav1d-perf-bounty/
They claimed
> This is enough of a difference to be a problem for potential adopters, and, frankly, it just bothers us.
But in my opinion nobody actually cares about 5% in absolute terms. It's likely just Rust naysayers using that as an excuse.
I think the likely reason for dav2d using C is that they can reuse lots of code and infrastructure from dav1d. But I agree it would be much better if they worked on Rav2d instead (these names!). You can hardly complain about a 5% overhead if you're opting in to 5x more decoding complexity.
funny you mention it — rav2d exists now: https://github.com/stukenov/rav2d full C-to-Rust port, asm kernels still via FFI like rav1d does. early (0.1.0) but passes conformance against dav2d.
Yes? There is 5x more code to optimize the ASM for.