> Let me rephrase this, 17% of the most popular Rust packages contain code that virtually nobody knows what it does (I can't imagine about the long tail which receives less attention).
I think this post has some good information in it, but this is essentially overstated: I look at crate discrepancies pretty often as part of reviewing dependency updates, and >90% of the time it's a single line difference (like a timestamp, hash, or some other shudder between the state of the tree at tag-time and the state at release-time). These are non-ideal from a consistency perspective, but they aren't cause for this degree of alarm -- we do know what the code does, because the discrepancies are often trivial.
Not only this, but the reason we can check what the discrepancy is is because crates.io distributes source code, not binaries, so they can always be inspected. In the end, whats in crates.io is the source of truth.
Isn't the point that unless actually audited each time, the code could still be effectively anything?
Yes, but that's already the case. My point was that in practice the current discrepancies observed don't represent a complete disconnect between the ground truth (the source repo) and the package index, they tend to be minor. So describing the situation as "nobody knows what 17% of the top crates.io packages do" is an overstatement.
I think it just depends on whether or not you interpret the phrase "no one knows" neutrally or pessimistically.
Saying that there could be something there, but "no one knows" doesn't mean that there is something there. But it's still true.
If that's the case, it would be a lot simpler (and equally accurate) to say that "no one knows" what the source repo is doing, either! The median consumer of packages in any packaging ecosystem is absolutely not reading the entire source code of their dependencies, in either the ground truth or index form.
That's certainly true - and would also be true (maybe even moreso) if vendoring dependencies was widespread. Seems just as easy to hide things in a "vendored" directory that's 20x the size of the library.
> So describing the situation as "nobody knows what 17% of the top crates.io packages do" is an overstatement.
Noting that you willfully cut the qualifying "virtually" from that quote, thereby transforming it to over-stated:
> Let me rephrase this, 17% of the most popular Rust packages contain code that virtually nobody knows what it does
That wasn't intentional. But also, I don't think "virtually" actually changes the meaning substantially; it has the same conventional meaning in that position as "effectively" or "might as well be nobody."
Serious consideration: Claude Mythos is going to change the risk envelope of this problem.
We're still thinking in the old mindset, whereas new tools are going to change how all of this is done.
In some years dependencies will undergo various types of automated vetting - bugs (various categories), memory, performance, correctness, etc. We need to think about how to scale this problem instead. We're not ready for it.
I specifically don't update the version in Cargo.toml in the codebase. I patch it in just before cargo publish, otherwise all other PRs now need to change.
> we do know what the code does
You know if you check. Hardly anyone checks. It's just normalization of deviance and will eventually end up with someone exploiting it.