If the language has unsafe regions, it doesn't entirely remove classes of bugs, since they can still occur in unsafe regions.
(Predictable response: "But they can only occur in unsafe regions which you can grep for" and my response to that: "so?")
If the language has unsafe regions, it doesn't entirely remove classes of bugs, since they can still occur in unsafe regions.
(Predictable response: "But they can only occur in unsafe regions which you can grep for" and my response to that: "so?")
I suppose the better response is that it removes those classes of bugs where they are absolutely unnecessary. Tricky code will always be tricky, but in the straightforward 80% (or more) of your code such bugs can be completely eliminated.
It's unfortunate that C has so many truly unnecessary bugs which are only caused by stupid overly "clever" exploitation of undefined behaviour by compilers.
Unfortunate, yes.
But what bugs? Suboptimal choices maybe; but any backwards compatible, popular language is going to have its share of those.
The ones GP is referring to all go away when you use -O0. They're completely artificially constructed by compiler writers language-lawyering the language. They were unforeseeable to the people who actually wrote the language, who expected interpretations like "dereferencing null crashes the program" or "dereferencing null accesses the interrupt vector table" and absolutely were not expecting "dereferencing null deletes the previous three lines of code"
Which I would definitely recommend as a strong default.
No matter whether you are using C for "freedom" or "flexibility" of "power", 95% of the time you only need that in a very small portion of your codebase. You almost definitely do _not_ need any of that in, say, the logic to parse CLI arguments or config files, which however is a prime example of a place where vulnerabilities are known to happen.
Which is in the past I would reach out to something like Perl on its heyday, given its coverage of UNIX API as part of the standard library, for anything manipulating CLI tools or config files.
Nowadays pick your scripting language, and if C is really needed, cleanly placing it in a loadable module with all security invariants into that scripting, or managed language, instead of 100% pure C source.
My solution since early 2000's.
Agreed, there's a lot to win from gluing C to a more protected language, I'm a fan of embedding a scripting language.
> Predictable response: "But they can only occur in unsafe regions which you can grep for" and my response to that: "so?"
The situation is both worse than this and better than this. Consider the .set_len() method on Rust's Vec. It's unsafe, because you could just .set_len(1_000_000) and then the Vec would happily let you try to read the nonexistent millionth element and segfault. However, if you could edit the standard library sources, you could add this new method to Vec without touching any unsafe code:
This is exactly the same as the real set_len, except it's a "fn" instead of an "unsafe fn". Now the Vec API is totally broken, and safe callers can corrupt memory. Also critically, we didn't write any unsafe code in "set_len_totally_safe_i_promise". The key detail is that this new method has access to the private self.len field of Vec that unsafe blocks in the same module rely on.In other words, grepping for all the unsafe blocks isn't sufficient for saying that a program is UB-free. You also have to make sure that none of the safe code ever violates an invariant that the unsafe blocks rely on. Read the comments, think really hard, etc.
So...what's the point of all this? The point is that it lets us define a notion of "soundness", such that if I only write safe code, and I only use libraries that are "sound", we can guarantee that my program is UB-free. In other words, any UB in my program would necessarily point to a bug in one of my dependencies, in the stdlib, or in the compiler. (Or you know, in the hardware, or in mathematics itself.) In other other words, instead of auditing my entire gigantic (safe) program for UB, we can reduce the problem to auditing my dependencies for soundness. Crucially, this decouples the difficulty of the problem from the size of my program. This wouldn't be very interesting if "safe code" was some impoverished subset, like "unsigned integer arithmetic only". But in fact safe code can use pointers, tagged unions, pointers into tagged unions, heap allocation/freeing, and multithreading. Lots of large, complicated, useful, real-world programs are written in 100% safe code. Here the version of this story with all the caveats and footnotes: https://jacko.io/safety_and_soundness.html
You still need to audit the safe part for other bugs...
But yes, this is nice and we should (and probably will) have a safe mode in C too.
Usually they can also happen outside, if you did something wrong in the unsafe region.
edit: I'm sorry that my captain obvious moment is turning out to be some truth bomb for some. Please keep downvoting as a way to regain your inner peace.
> if you did something wrong in the unsafe region.
*you or anyone else in your chain of dependencies that use unsafe