> The last time I used [a doubly linked list] was in the nineties. I know the Linux kernel uses them, but that design was also laid down in the nineties; if you were designing a kernel from scratch today, you would probably not do it that way.
This and sibling comments about the supposed uselessness or outdatedness of linked lists taught me something about the disconnect between many systems engineers resistant to Rust and many members of the (evangelical) Rust community.
To address the technical topic first: a doubly linked list is what you reach for when you’re looking for an intrusive data structure with fast append and fast delete. You need to iterate over it, but don’t need random access. Queues where you may need to remove elements. Think a list of completions that can be cancelled.
Why an intrusive data structure? Because you don’t want to, or can’t allocate. Because you want only a non-owning reference to the object, and you don’t want to have to manage any other lifetimes or dynamic (potentially unbounded, or arbitrarily bounded) allocations. Intrusive data structures are a great tool to minimize allocations, and to keep allocations (and ownership) logically separated across modules.
And now onto the cultural topic. That a member of the Rust community might not know this (which is of course totally fine) is what taught me something, which was maybe obvious to many: Rust brings safety to a C++ style language and audience. For the same reasons many dislike C++, many dislike Rust. For the same reasons many would opt for C over C++ when they have the choice (either across all their projects or for certain projects), many continue to opt for C over Rust.
Rust will not be taking over systems programming for the same reasons C++ did not. Well, by the numbers, it probably did (and Rust probably will too), so maybe better to say that it will not stamp out C for systems programming for the same reasons C++ did not. And it’s not just because of the stubbornness of C programmers.
Systems programming has 2 cultures and practice groups: the C camp, and the C++ camp. The C++ camp is a big tent. I’d argue it includes Rust and Swift (and maybe D). Zig belongs to the C camp.
Safety is important, and maybe Rust has the right model for it, but it is not a solution for the C camp. It likely did not intend to be, and again, by the numbers, replacing C++ is probably the more important task.
To clarify a couple of things:
I'm aware of the reasons Linux uses doubly linked lists. I'm still of the opinion this design decision made a lot of sense in the 1990s, and would make less sense in a new project today. You may disagree, and that's fine! Engineering is constrained by hard truths, but within those constraints, is full of judgment calls.
I'm not a member of the evangelical Rust community. I'm not proclaiming 'thou shalt rewrite it all in Rust'. Maybe you have good reasons to use something else. That's fine.
But if you are considering Rust, and are concerned about its ability to handle cyclic data structures, there are ways to do it, that don't involve throwing out all the benefits the language offers.
Linux's use of pointer-heavy datastructures is largely justified even today. The low-level kernel bookkeeping requirements rule out many typical suggestions. Modern hardware suggests certain patterns in broad strokes, which are sound for mostly any software. But the kernel has the unique role of multiplexing the hardware, instead of just running on the hardware. It is less concerned with, e.g., how to crunch numbers quickly than it is with how to facilitate other software crunching numbers quickly. As someone else noted elsewhere in the top-level thread, unsafe Rust is the appropriate compromise for things that must be done but aren't suitable for safe Rust. Unsafe Rust is one of the best realizations of engineering tradeoffs that I've seen, and neither C nor safe Rust justify its absence. Rust is only a "systems" language with unsafe, and that doesn't drag Rust down but rather strengthens it. "Here is the ugliness of the world, and I will not shy from it."
Beautifully put, poetic even. The only problem is that, in (*actual).reality, unsafe Rust is difficult to read, difficult to write, difficult to maintain, difficult to learn, has zero features for ergonomics, and has poorly defined rules.
C and Zig have a big advantage for writing maintainable, easy to read unsafe code over unsafe Rust.
> The only problem is that, in (*actual).reality, unsafe Rust is difficult to read, difficult to write, difficult to maintain, difficult to learn, has zero features for ergonomics
All true.
> and has poorly defined rules.
The rules aren't so poorly documented. The options for working with/around them, like say using UnsafeCell, do seem to be scattered across the internet.
But you omitted a key point: unlike C and Zig, unsafe code in Rust is contained. In the Rust std library for example, there are 35k functions, 7.5k are unsafe. In C or Zig, all 35k would be unsafe. If you are claiming those 35k unsafe lines in C or Zig would be easier to maintain safely than those 7.5k lines of unsafe Rust, I'd disagree.
I agree that unsafe Rust is not comfortable or simple, but I think it is usable when appropriately applied. It should very scarcely be used as a performance optimization. The Rust devs are quite invested in minimizing unsoundness in practice, particularly with Miri. In coming years, I expect unsafe to be further improved. Over the entire ecosystem, Rust with judicious use of unsafe is, in my opinion, vastly superior to C, unless the C is developed stringently, which is rare.
So, as an example, you'd be happily spending an extra allocation and extra pointers of space for each item in a list, even when that item type itself is only a couple of bytes, and you potentially need many millions of that type? Just so your design is not "from the nineties"?
An extrusive list needs at least 1 more pointer (to the item, separate from the link node), and possibly an additional backpointer to the link node when you want to unlink that node. It also adds allocation overhead and cache misses.
Intrusive lists are one of the few essential tools to achieve performance and low latency.
Or were you thinking of dynamically reallocating vectors? They are not an alternative, they are almost completely unusable in hardcore systems programming. Reallocating destroys pointer stability and adds latency, both very bad for concurrency.
I’m sorry, I did not intend to accuse you of being part of the evangelical community. Your article only prompted the thought I shared.
On the technical point, I think I do disagree, but open to changing my mind. What would be better? I’m working on an async runtime currently, written in C, and I’m using several intrusive doubly linked lists because of their properties I mentioned.
No problem!
As to what would be better - this is also a reply to your sibling comments above - I don't have a single across-the-board solution; the equivalent of std::vector everywhere is fine for some kinds of application code, but not necessarily for system code. Instead, I would start by asking questions.
What kinds of entities are you dealing with, what kinds of collections, and, critically, how many entities along each dimension, to an order of magnitude, p50 and p99? What are your typical access patterns? What are your use cases, so that you can figure out what figures of merit to optimize for? How unpredictable will be the adding of more use cases in the future?
In most kinds of application code, it's okay to just go for big-O, but for performance critical system code, you also need to care about constant factors. As an intuition primer, how many bytes can you memcpy in the time it takes for one cache miss? If your intuition for performance was trained in the eighties and nineties, as mine initially was, the answer may be larger than you expect.
Even if you just go for big-O, don't forget that a resizable array won't give you even amortized O(1) delete in many cases. This alone is likely prohibitive unless you can bound the elements in the container to a small number.
And if you're trying to trade away good big-O for better cache locality, don't forget that in many cases, you're dealing with stateful objects that need to be put into the list. That means you likely need to have a list or queue of pointers to these objects. And no matter how flat or cache-friendly the queue is, adding this indirection is similarly cache-unfriendly whenever you have to actually access the state inside the container.
Or unless delete is a rare operation. So yeah, to make the best decisions here, you need to know expected numbers as well as expected access patterns.
As far as I can see, you are indeed going to incur one extra memory access apart from the object itself, for any design other than just 'Temporarily flag the object deleted, sweep deleted objects in bulk later' (which would only be good if deleted objects are uncommon). It still matters how many extra memory accesses; deleting an object from a doubly linked list accesses two other objects.
It also matters somewhat how many cache lines each object takes up. I say 'somewhat' because even if an object is bulky, you might be able to arrange it so that the most commonly accessed fields fit in one or two cache lines at the beginning.
The Rust for Linux project has designed safe intrusive linked lists for use in the kernel, and while they're inconvenient to use, this and other things have led to the push to creating language features to improve the scenario.
As for the C camp, I agree it's different. The problem is that we don't know a way to design memory safe GC-free language without it being big and complex. Maybe it is possible. But until we figure out how, projects that need to be memory safe (which I believe is the vast majority of projects, although not all of them) will probably use Rust (and probably should use Rust), even if they would prefer to be pure C, because memory safety is just more important.
> To address the technical topic first: a doubly linked list is what you reach for when you’re looking for an intrusive data structure with fast append and fast delete. You need to iterate over it, but don’t need random access. Queues where you may need to remove elements. Think a list of completions that can be cancelled. > > Why an intrusive data structure? Because you don’t want to, or can’t allocate. Because you want only a non-owning reference to the object, and you don’t want to have to manage any other lifetimes or dynamic (potentially unbounded, or arbitrarily bounded) allocations. Intrusive data structures are a great tool to minimize allocations, and to keep allocations (and ownership) logically separated across modules.
If you really wanted this in Rust, you could probably get away with just initialising a Vec::with_capacity(), then have each element’s next and prev be indexes into the vec.
That would be the “arbitrarily bounded” allocation. Arbitrary because now you have to make a decision about how many items you’re willing to maintain despite that number being logically determined by a sum over an unknown set of modules.
Vec size is dynamic, so no.
Now you’re allocating in an ISR?
Rust won’t even succeed at replacing C++. There are technical and cultural issues at play. C++ has evolved a lot and still does some things better than Rust (dynamic linking, for example). Rust will be another popular systems programming language. There will be better ones. People will take their picks.
I find C++ friendlier for small hobby projects because it lets you get your code compiled even if it's a little messy or slightly unsafe.
As for "serious code", I admit that Rust is maybe better for low-level security-critical stuff. But for native applications it's on par thanks to smart pointers and also just -fsanitize=address.
(also default copy semantics just seem more intuitive i dunno why)
While Rust is certainly competing against C++ moreso than C, I think Rust has qualities that make it suitable to rewrite old C programs or write new programs that would be written in C. Drawbacks of Rust such as excessive dependency usage, simply feeling too complex to write and read, overuse of the type system, or the subpar low-level hardware correspondence, are not huge issues in the right subculture. I think Rust is not quite C but offers a distinct path to do what C does. In any case, legacy C programs do need a pick-me-up, probably many from different technologies, and I think Zig is at most a small factor.