I thought Rust treated undefined behaviour as a compiler bug? Does anyone know what's actually happening here?

"unsafe" is a promise to the compiler that you're going to ensure invariants that the compiler can't check. Rust only promises to eliminate UB if the invariants are held. You can still get UB by violating that promise, as this bug demonstrates.

But the title here says "in safe Rust", no? Is the unsafe code causing UB in safe code? I thought the unsafety couldn't "spread" like that in Rust.

That is not Rusts guarantee. The guarantee is that safe rust cannot in itself introduce UB - UB can only ever be introduced in unsafe blocks, but it can then materialize in safe code.

Ah OK, that makes sense, thanks.

It can spread into safe code when you build an incorrect "safe" abstraction around unsafe code. Which the Bun Rust port apparently has.

I can't tell if you're trolling but `unsafe { crash() }` is safe from the compiler's perspective. Otherwise you wouldn't be able to achieve anything in 'safe' rust, even print to stdout.

I think its a good question, just because the whole UB thing is such an ideological shibboleth.

Maybe its better to think about this in the reverse, where C and C++ has 'defined behavior', but unsafe rust intentionally does not, its just whatever the complier and platform lets you get away with. Ultimately its still just a computer which stores values in memory and jumps to subroutines.

Every language has defined behavior. It's what you expect to happen through a program's execution. Sometimes there will be multiple possibilities, but you can still define them regardless. Laying this out explicitly is the purpose of a standard.

Undefined behavior is everything else. C and C++ are relatively unique in that their standards explicitly say "combining these constructs in this way is undefined", and we call those cases explicit UB. There's also a larger universe of implicit UB that standards omit. Most (all?) languages have implicit UB, even if they lack the explicit stuff. What happens when you get ENOMEM is a common one.

Rust does something similar to C/C++ and lists a bunch of UB that's only possible with incorrect code in unsafe blocks. Correct code placed in an unsafe block remains defined, as does code without unsafe (up to compiler/language bugs).

Yeah, if I understand correctly, the Rust project has no intention to formally 'define' what unsafe actually does, so its very implicit. Could be anything... so it's the Does It Work? standard.

> I thought the unsafety couldn't "spread" like that in Rust.

The goal of a library is to provide the encapsulation such that the unsafety doesn't spread.

If undefined behavior occurs, the fault lies with whoever wrote `unsafe { ... }` in the body of a function. If I write "unsafe" in order to call an unsafe library function, and I don't meet the library function's pre-requisites, then it's my fault. If the library internally writes "unsafe" in order while providing a safe wrapper, and I never actually wrote `unsafe { ... }`. If neither I nor the library wrote `unsafe { ... }`, then it is the fault of the compiler.

Using "in safe Rust" means that `unsafe` doesn't occur either in the user code nor in the library. In this context, since we've heard how many uses of `unsafe { ... }` exist in the Bun rewrite, I'd read "in safe Rust" to mean "without calling any functions marked as unsafe".

If you use unsafe improperly, it is possible to encounter UB in "safe" code which relies on the unsafe code being correct.

it's more straightforward to write safe rust when rust owns everything, In real world you often are interfacing with underlying libs or systems etc, which you need to treat as invariants but also handle yousrelf manually to make guarantees to compiler. unsafe exists in tons of codebases it's just you have to make sure you encapsulate it properly, which is what this bug is.

Unsafe code can break certain invariants of Rust, as `unsafe` is just a compiler "hold my beer" flag, which is why you're meant to do safety checks in your safe interface around unsafe code. If the unsafe code is wrapped in a way that does no guarding (or does something stupid in general), it is technically marked safe (because you said "rustc, hold my beer" as `unsafe` is also a contract) despite actually being unsafe

UB != unsafe

Rust has lots of undefined behavior, in general a broadly similar set to that which exists in C. What Rust does that is different is that to trigger undefined behavior, you need to execute unsafe code. (This isn't the same as saying that you have to be in unsafe code--you can violate a precondition in unsafe code and have the UB itself trigger in safe code).

That's a good explanation, thank you.

I'm sure there have been attempts at defining a language that has no UB, but afaik all meaningful languages have UB in some dark corner or enumerated explicitly. For example, Java thread execution order is UB.

> For example, Java thread execution order is UB.

In this context "UB" means something different than how you're using it. The UB being mentioned here is the "nasal demons" form, i.e., programs which contain undefined behavior have no defined meaning according to the language semantics.

What you're talking about is probably better described in this context as "unspecified behavior", which is behavior that the language standard does not mandate but does not render programs meaningless. For example, IIRC in C++ the order in which g(), h(), and i() are evaluated in f(g(), h(), and i()) is unspecified - an implementation can pick any order, and the order doesn't have to be consistent, but no matter the order the program is valid (approximately speaking).

Great example.

So this "unspecified behavior" might turn into the more nasal demon type when g(), h() and i() share mutable state and assume some particular sequential order of execution. No?

Not necessarily. Unspecified behavior and undefined behavior are independent concepts; a language can have one but not the other. As a result, you can have languages where incorrect reliance on unspecified behavior can lead to undefined behavior (e.g., C and C++) and languages where incorrect reliance on unspecified behavior can lead to bugs, but not nasal demons (e.g., Java)

That would depend entirely on the assumptions being made and the constructs being used. I think in most cases it would likely just result in regular garden variety bugs.

But sure, if you're writing C++ and (for example) g is depended on for initialization of pointed memory that the other two consume you could end up with UB. But if you're writing Java then no, you will not end up with UB just buggy code.

It is only allowed in unsafe blocks. As long as the unsafe blocks are few and well understood then Rust programmers can contain this to a small well defined portion of a program.

Safe Rust does.

Unsafe Rust allows you to tell the compiler “hold my beer”. It’s a concession to the reality that the normal restrictions of Rust disallow some semantically valid programs that you might otherwise want to write. The safeguards work great in most cases, but in some they’re overly restrictive.

In practice, the overwhelming majority of code is able to be written in safe Rust and the compiler can have your back. The majority of the rest is for performance reasons, interacting with external functions like C libraries over FFI, or expressing semantics that safe Rust struggles with (e.g., circular references).

OK but the title says "in safe Rust". Am I misunderstanding something? All the replies here are saying how it's allowed in unsafe Rust, which is not what the title says.

If code in an unsafe block triggers undefined behavior, then the assumptions the compiler makes regarding safety will no longer be true, and purely safe code (code with no unsafe blocks) is no longer guaranteed to be safe. This is what's happening in the example the person on Github wrote in the issue.

Exactly and "[...]and purely safe code (code with no unsafe blocks) is no longer guaranteed to be safe" hits the nail on the head.

I take issue with the phrasing of OP's title: "allows for UB in safe rust". AFAIK there are compiler bugs that allow UB in safe Rust, but this is not what is happening here. We have UB in an unsafe block (which is to be expected) which enables an issue outside in safe code. What is your opinion? Is calling this "UB in safe Rust" justified?

it is, but it's a little confusing here because the library/consumer of the library are the same person.

This is a bug in the library, namely in Bun's PathString implementation. The bug is a soundness issue, precisely because usage of Bun's PathString implementation allows for UB in safe rust. Now this buggy library isn't that big of a concern for the community, because Bun is the only consumer. It's not also an indication of a compiler bug, because Bun's library is implemented using unsafe rust. But the fundamental issue is that usage of Bun's PathString implementation allows for UB in safe rust, and is therefore (clearly) unsound.

Suppose I initialize something in an unsafe block. I promise the compiler that it's properly initialized, but in reality it isn't. Importantly I never make use of the garbage values in the unsafe block so no UB has occurred - yet.

Later, the garage enters otherwise safe machinery and triggers UB. UB has now happened in safe rust as a result of my earlier contractual violation.

You can extend this example to other scenarios where UB in unsafe begets further UB in safe later on.

`unsafe` isn't viral. I can write

fn safe_function(...) -> (...) {

    // do unsafe things here
}

then `safe_function` can be called from safe code, and still trigger UB. This wouldn't be a soundness issue in the rust compiler, but instead a bug in safe_function.

There are many reasons you might want to do that. In particular, it's very common in rust to have a library define some data structure that uses unsafe under-the-hood, but checks whatever invariants it needs to, and provides solely safe methods to external callers. Rust's `String` type is like this: it's (roughly) a `Vec<u8>`, e.g. heap-allocated bytes. It has the additional invariant that these bytes correspond to valid UTF8 though. See for example `push_str_slice`, which (roughly) concatenates 2 strings.

https://doc.rust-lang.org/src/alloc/string.rs.html#1107

It does the following thing

1. reserve enough space for the concatenated string within the source string 2. does some pointer arithmetic and a call to Rust's equivalent to `memcpy` (unsafe) 3. re-casts this pointer to a string object without checking that it's valid utf8 (unsafe).

While these individual calls are unsafe, `push_str_slice` checks that in this particular situation they are safe, so the stdlib authors do not mark `push_str_slice` as unsafe. It has no invariants that must be maintained by external callers.

Unsafe blocks are you saying to the compiler 'trust me bro, I know this is safe'. But often that relies on some property of the code being true in order for it to actually be safe. Generally speaking, the expectation in rust is that you either encapsulate the code that enforces whatever property you are relying on behind a safe interface, so that it's not possible for other code to use it unsafely, or that you mark the interface itself unsafe so that it's obvious that the code using that interface needs to maintain that property itself. Rust code that doesn't do this will generally be considered buggy by most rust programmers (e.g. if you find a use of safe interfaces in the stdlib that causes a memory safety violation, then you should file a ticket with the rust team), but this is essentially only a social convention of where the blame lies for a bug, not something that compiler itself can enforce (and, for example, you can violate memory safety in rust with only safe std interface by abusing OS interfaces like /proc/self/mem but this is something that most people don't think can be reasonably fixed). The main reason that rust as a language is better in this regard is that it gives much better tools for being able to express that safe interface without giving up performance and that it has the means to mark and encapsulate this safe/unsafe distinction.

Here's some links on this topic which have some examples:

https://doc.rust-lang.org/nomicon/working-with-unsafe.html https://www.ralfj.de/blog/2016/01/09/the-scope-of-unsafe.htm...

They are using unsafe since large portions of Bun is interfacing with other unsafe codebases. Together with a "1:1" rewrite from Zig to Rust.

And it's not like Bun when written in Zig has been a beacon of stability either. It has been segfaults all over the place.