Hacker News

To clarify a couple of things:

I'm aware of the reasons Linux uses doubly linked lists. I'm still of the opinion this design decision made a lot of sense in the 1990s, and would make less sense in a new project today. You may disagree, and that's fine! Engineering is constrained by hard truths, but within those constraints, is full of judgment calls.

I'm not a member of the evangelical Rust community. I'm not proclaiming 'thou shalt rewrite it all in Rust'. Maybe you have good reasons to use something else. That's fine.

But if you are considering Rust, and are concerned about its ability to handle cyclic data structures, there are ways to do it, that don't involve throwing out all the benefits the language offers.

vacuity 2 days ago [ - ]

Linux's use of pointer-heavy datastructures is largely justified even today. The low-level kernel bookkeeping requirements rule out many typical suggestions. Modern hardware suggests certain patterns in broad strokes, which are sound for mostly any software. But the kernel has the unique role of multiplexing the hardware, instead of just running on the hardware. It is less concerned with, e.g., how to crunch numbers quickly than it is with how to facilitate other software crunching numbers quickly. As someone else noted elsewhere in the top-level thread, unsafe Rust is the appropriate compromise for things that must be done but aren't suitable for safe Rust. Unsafe Rust is one of the best realizations of engineering tradeoffs that I've seen, and neither C nor safe Rust justify its absence. Rust is only a "systems" language with unsafe, and that doesn't drag Rust down but rather strengthens it. "Here is the ugliness of the world, and I will not shy from it."

autoreleasepool 2 days ago [ - ]

Beautifully put, poetic even. The only problem is that, in (*actual).reality, unsafe Rust is difficult to read, difficult to write, difficult to maintain, difficult to learn, has zero features for ergonomics, and has poorly defined rules.

C and Zig have a big advantage for writing maintainable, easy to read unsafe code over unsafe Rust.

rstuart4133 16 hours ago [ - ]

> The only problem is that, in (*actual).reality, unsafe Rust is difficult to read, difficult to write, difficult to maintain, difficult to learn, has zero features for ergonomics

All true.

> and has poorly defined rules.

The rules aren't so poorly documented. The options for working with/around them, like say using UnsafeCell, do seem to be scattered across the internet.

But you omitted a key point: unlike C and Zig, unsafe code in Rust is contained. In the Rust std library for example, there are 35k functions, 7.5k are unsafe. In C or Zig, all 35k would be unsafe. If you are claiming those 35k unsafe lines in C or Zig would be easier to maintain safely than those 7.5k lines of unsafe Rust, I'd disagree.

vacuity 2 days ago [ - ]

I agree that unsafe Rust is not comfortable or simple, but I think it is usable when appropriately applied. It should very scarcely be used as a performance optimization. The Rust devs are quite invested in minimizing unsoundness in practice, particularly with Miri. In coming years, I expect unsafe to be further improved. Over the entire ecosystem, Rust with judicious use of unsafe is, in my opinion, vastly superior to C, unless the C is developed stringently, which is rare.

jstimpfle 2 days ago [ - ]

So, as an example, you'd be happily spending an extra allocation and extra pointers of space for each item in a list, even when that item type itself is only a couple of bytes, and you potentially need many millions of that type? Just so your design is not "from the nineties"?

An extrusive list needs at least 1 more pointer (to the item, separate from the link node), and possibly an additional backpointer to the link node when you want to unlink that node. It also adds allocation overhead and cache misses.

Intrusive lists are one of the few essential tools to achieve performance and low latency.

Or were you thinking of dynamically reallocating vectors? They are not an alternative, they are almost completely unusable in hardcore systems programming. Reallocating destroys pointer stability and adds latency, both very bad for concurrency.

prngl 2 days ago [ - ]

I’m sorry, I did not intend to accuse you of being part of the evangelical community. Your article only prompted the thought I shared.

On the technical point, I think I do disagree, but open to changing my mind. What would be better? I’m working on an async runtime currently, written in C, and I’m using several intrusive doubly linked lists because of their properties I mentioned.

rwallace 2 days ago [ - ]

No problem!

As to what would be better - this is also a reply to your sibling comments above - I don't have a single across-the-board solution; the equivalent of std::vector everywhere is fine for some kinds of application code, but not necessarily for system code. Instead, I would start by asking questions.

What kinds of entities are you dealing with, what kinds of collections, and, critically, how many entities along each dimension, to an order of magnitude, p50 and p99? What are your typical access patterns? What are your use cases, so that you can figure out what figures of merit to optimize for? How unpredictable will be the adding of more use cases in the future?

In most kinds of application code, it's okay to just go for big-O, but for performance critical system code, you also need to care about constant factors. As an intuition primer, how many bytes can you memcpy in the time it takes for one cache miss? If your intuition for performance was trained in the eighties and nineties, as mine initially was, the answer may be larger than you expect.

jstimpfle a day ago [ - ]

Even if you just go for big-O, don't forget that a resizable array won't give you even amortized O(1) delete in many cases. This alone is likely prohibitive unless you can bound the elements in the container to a small number.

And if you're trying to trade away good big-O for better cache locality, don't forget that in many cases, you're dealing with stateful objects that need to be put into the list. That means you likely need to have a list or queue of pointers to these objects. And no matter how flat or cache-friendly the queue is, adding this indirection is similarly cache-unfriendly whenever you have to actually access the state inside the container.

rwallace a day ago [ - ]

Or unless delete is a rare operation. So yeah, to make the best decisions here, you need to know expected numbers as well as expected access patterns.

As far as I can see, you are indeed going to incur one extra memory access apart from the object itself, for any design other than just 'Temporarily flag the object deleted, sweep deleted objects in bulk later' (which would only be good if deleted objects are uncommon). It still matters how many extra memory accesses; deleting an object from a doubly linked list accesses two other objects.

It also matters somewhat how many cache lines each object takes up. I say 'somewhat' because even if an object is bulky, you might be able to arrange it so that the most commonly accessed fields fit in one or two cache lines at the beginning.