"Longer term, we will build on this to introduce a safe Carbon subset. This will be a large and complex undertaking, and won’t be in the 0.1 design."
If they can't get safety right at the design stage, they'll never get it right. We already have D and Zig in this space.
Zig is nowhere near memory safe. Even some basic semantics (passing arguments as values or references) are horribly broken. https://github.com/ziglang/zig/issues/5973
no need for memory safety to be in the language. It can still be checked at compile-time:
https://www.youtube.com/watch?v=ZY_Z-aGbYm8
This is a cool project, but it doesn’t address my original issue with Zig, which is that the language’s semantics is not even specified. That is, we cannot define what a “memory safe Zig program is” as we cannot even define what a Zig program is! (Unless you define the semantics of a Zig program in terms of the implementation/translation to IR, which is fragile / bad.)
Second, I would be surprised if the static analyses in the tool are precise enough for real-world Zig programs. For example, it is undecidable to determine whether a function “takes ownership” of an argument pointer. In particular, if you want to avoid false negatives, the “free after transfer” case needs to be conservative, but then you almost certainly will flag false positives.
of course. There will absolutely need to be a way to specify intent in certain cases or override the judgement of the checker. That may come in the form of external annotation, or possibly internal annotation, which in principle is achievable using the same technique as you see in the "unitful" demo, though that's not ideal.
as for zig being specified, well, it's pre 1.0, and the authors have I believe, specifically called out specification as being "the first priority after 1.0".
Given that Carbon's space is "languages with full interoperability with C++," I don't think D and Zig are in that space.
As to "getting it right" - things are not so simple. The emphasis on memory-safety soundness is based on some empirical hypotheses, some better founded than others, and it's unclear what "getting it right" means.
From a software correctness perspective, the road to sound memory safety is as follows: 1. We want to reduce the amount of costly bugs in software as cheaply as possible, 2. Memory unsafe operations are a common cause of many costly bugs, 3. Some or all memory bugs can be eliminated cheaply with sound language guarantees.
The problem is that 1. memory safety refers to several properties that don't all contribute equally to correctness (e.g. out-of-bounds access causes more serious bugs than use-after-free [1]), and 2. soundly guaranteeing different memory safety properties has different costs. It gets more complicated than that (e.g. there are also unsound techniques that have proven very effective to consider), but that's the overview.
It is, therefore, as of yet unclear which memory safety properties are worth it to soundly guarantee in the language, and the answer may depend on the language's other goals (and there must be other goals that are at least as important, because the empty language guarantees not only all memory safety properties but all (safety [2]) correctness properties, yet nobody uses it as it's useless, while a language like ATS can be used to write many useful programs, but few use it because it's just too costly to use well). The goal is always to find the right balance.
For example, Java soundly guarantees lack of use-after-free at the cost of increased memory footprint; that may be "getting it right" for some programs but not all. Rust soundly guarantees lack of use-after-free at the cost of imposing strong and elaborate typesystem constraints (that, as is often the case, are more constraining than the property they guarantee); that, too, may be "getting it right" for some programs, though not all. Zig guarantees lack of out-of-bounds access in a simple language at the cost of not guaranteeing lack of use-after-free, and that may also be "getting it right" for some programs but not all.
So what "getting it right" means always depends on constraints other than safety (Rust and Zig want to consume less memory than Java; Java and Zig want to be simpler than Rust; Java and Rust want to guarantee more memory safety properties than Zig). If Carbon wants to be more interoperable with C++ than Java, Rust, or Zig, then it will have to figure out what "getting it right" means for Carbon.
[1]: https://cwe.mitre.org/top25/archive/2024/2024_cwe_top25.html
[2]: https://en.wikipedia.org/wiki/Safety_and_liveness_properties
> As to "getting it right" - things are not so simple. The emphasis on memory-safety soundness is based on some empirical hypotheses, some better founded than others, and it's unclear what "getting it right" means.
It means eliminating undefined behavior, and unplanned interaction between distant parts of the program.
Eliminating undefined behaviour is a means to an end (reduces problematic bugs, but not all undefined behaviour is equally responsible to such bugs), and it's not a binary thing (virtually all programs need to interact with software written in languages that don't eliminate undefined behaviour, so clearly there's tolerance to the possibility of undefined behaviour).
Don't get me wrong - less undefined behaviour is better, but drawing a binary line between some and none makes for a convenient talking point, but isn't necessarily the sweet spot for the complicated and context-dependent series of tradeoffs that is software correctness.
By definition all the C++ Undefined Behaviour is unbounded. You may believe, and even you may have practical evidence for the compilers which happen to exist today, that in some cases the behaviour in fact bounded, but that's not what the language definition says and optimisers have a funny way of making fools of people who mistake Undefined for "Eh, it'll probably do what I meant".
It might seem as though incrementing a signed integer past its maximum can't be as problematic as a use after free even though both are Undefined Behaviour, but nah, in practice in real C++ compilers today they can both result in remote code execution.
There is a place for Unspecified results, for example having it be unspecified whether a particular arithmetic operation rounds up or down may loosen things up enough that much faster machine code is generated and well, the numbers are broadly correct still. But that's not what Undefined behaviour does.
Yes, that's what UB means - the program loses all meaning and any effect is possible - but the importance of a bug is much more than just its blast radius. We ultimately care not about the language semantics of a bug but its expected loss of value. This is also dependent on how frequently the bug appears and how easy it is to prevent or find and fix. Not all UBs are equal on that front.
Furthermore, an unbounded blast radius isn't itself the direct problem. A bug that with some probability casues your program to crash and your disk to be deleted is far less dangerous than a bug that allows a remote attacker to relatively easily steal all your secrets. UBs also differ on that front.
And again, virtually all programs are not provably without UB. For example, a Java program still interacts with an OS or with some native library that might suffer from a UB. So clearly we do tolerate some probability of UB, and we clearly do not think that eliminating any possibility of UB is worth any price.
When a program is just code on the screen, it's just a mathematical object, and then it's easy to describe a UB - the loss of all program meaning - as the most catastrophic outcome. But software correctness goes beyond the relatively simple world of programming language semantics, and has to consider what happens when a program is running, at which point it is no longer a mathematical object but a physical one. If a remote attacker steals all our secrets, we don't care if it's a result of some bug in the program itself (due to UB or otherwise), in other software the program interacts with, some fault or weakness in the hardware, or human operator error. The probability of any of these things is never zero, and we have to balance the cost of addressing each of these possibilities.
To give an example in the context of Carbon, we know that old code tends to suffer from fewer severe bugs than new code. So, if we want to reduce the probability of bugs, it is possible that it may be more worthwhile to invest - say, in terms of language complexity budget - in interop with C++ code than in eliminating every possible kind of UB, including those that are less likely to appear, sneak past testing, and cause an easily exploitable vulnerability.
> the most catastrophic outcome
Nah, this is in the context of C++ so UB isn't the most catastrophic situation. Undefined Behaviour is a behaviour, which means we might well be able to avoid invoking the behaviour and then it's fine. For example if your C++ program has a use-after-free in the code invoked only by the "Add file" feature but otherwise works fine, we can just ensure operators never use "Add file" and the UB, no matter what it might be, won't happen.
C++ has IFNDR, clauses which say this is "Ill-Formed, No Diagnostic Required". The program itself wasn't actually a valid C++ program, it has absolutely no defined meaning, there may be no indication of a problem from your compiler but alas it's entirely meaningless. This isn't behavioural, it's an invisible status of the entire program.
As I said, UB means that a program has no meaning in the source language. That doesn't change anything I wrote above about the impact of UB. A hardware fault may also make a program meaningless, BTW, yet we all accept that every program written in any language is currently susceptible to UB or to a hardware fault.
If you want, you can think of UB like a mathematical singularity in some physical theory. The theory of the language has nothing to say about what happens in such a program. But that doesn't mean that we can't reasonably talk about what happens not using that theory [1]. Indeed, one of the reasons that UB are of concern is that some of them are frequent causes of security exploits - and that's the thing we ultimately care about, not that the program loses its semantics. But again, not all of them are equally common causes of such outcomes, and not all of them are equally hard to avoid in the first place.
[1]: In fact, this is easy to explain in software: the programming language can say nothing about the meaning of a program with UB - indeed, it has no meaning in that language - but because we do have an executable, the program still has a well-defined meaning in the machine language it compiled to (machine language has no UB, or, even if some machine architecture does declare that some instruction stream is UB, most programs with UB in some programming language still do not compile to a program with UB in machine code). So the program that has no meaning in C++ still has meaning in machine code, and as that is the program we ultimately run and care about, we can talk about which UBs are more or less likely to result in which machine code behaviour.
This seems like it missed the point entirely?
UB is a behaviour, it's unbounded, so it's an immediate disaster, and "time travel" UB can make this harder to reason about, because the as-if rule can mean that although it didn't in some sense "happen" yet the behaviour has consequences earlier. But if we avert the behaviour it won't happen. It is not correct to say that UB means the entire program had no meaning.
You give the "mathematical singularity" analogy, consider division. We doubtless agree than 6 divided by 3 is 2. And 6 divided by 2 is 3. But how about 6 divided by 0? This is not defined, we cannot perform such an operation. But division is not as a result somehow entirely without meaning, it just has this well understood limitation. Likewise for software with UB that we can avert.
IFNDR is a catastrophe because it truly does render the entire software without meaning.
> IFNDR is a catastrophe because it truly does render the entire software without meaning.
In the language. I.e. the language assigns no meaning to source the program, which is, indeed, the "catastrophic" impact of UB (or IFNDR) within the theory of the language. But since a running program takes the form of an executable, and the executable always has a defined meaning in another language (machine code), while C++ has nothing to say about what such programs do (i.e. that's the end of the helpfulness of that theory) that doesn't mean we can't talk about or care about the meaning of the executable.
An executable that crashes and an executable that leaks all your secrets have very different consequences, and while the C++ spec says absolutely nothing about the relationship between different UBs and these behaviours, that doesn't mean that these relationships don't exist.
A mathematical singularity in a physical theory means that that particular theory has nothing to say about the physics of that situation, not that there's no actual physics going on, and the physics that is actually going on could be more or less "catastrophic" depending on what we mean by that.
Splendid reply! I'm a big fan of Carbon and so I really appreciate when people make solid arguments for its tradeoff space.
I counted the word “sound” in this reply 8 times. When I the word sound there is always the word Rust nearby. It is just a coincidence, of course.
Swift seems to be doing a decent job of this (and C++ interop for that matter)
> Swift seems to be doing a decent job of this
It keeps adding keywords and it has become way harder to keep it in your head. It’s over 220 at this point. Don’t take my word for it, Swift creator doesn’t agree with its current direction either.
I'm not sure why I should take the word of the guy who added a bunch of these keywords so he could use Swift for AI
Curious, what keywords were those? I think Chris Lattner was not involved with Swift after version 3
https://github.com/swiftlang/swift/blob/main/docs/Differenti...