No, without more context I don't immediately see how it would fall flat. If you're dealing with larger data, like Gigabytes of data, it might be better to use chunks of 1 MB or sth like that. My point is that stream processing is probably best done in chunks. Otherwise, you have to to fit all the transformed data in RAM at once -- even when it's ephemeral. Wasting Gigabytes without need isn't great.
Because you don’t know a priori whether any particular data isn’t needed anymore and linking is not a stream processing problem - you have to access any part of the “stream” multiple times randomly. Almost any part of the program can end up linking to another. So you do end up needing to keep it all in memory. And modern linkers need to do LTO which ends up needing to load all compilation units into memory at once to do that optimization.
But sure, if you’re that confident go for it. Writing a linker that can use a fraction of the memory, especially during LTO would be a ground breaking achievement.
From a skim, my impression was that there was a specific stream processing problem discussed but I probably have gotten that wrong, or a comment was edited. If I understand better now, in the OP, Rust streaming features are being used here to achieve essentially a typecast, and the streaming transformations (looking at each individual element) are supposed to be optimized away. And I restate that this is essentially solving a language-inflicted problem, not some interesting performance problem.
For sure, linking "graphs" generally have random connections so obviously I'm not saying that linking in general is a trivial stream processing problem.
Thinking about it though, linking graphs are _almost_ free of cyclic dependencies in practice (even when looking at entire chunks of code or whole object files as units), so there are some basic tricks we can use so we don't need to load all compilation units into RAM as I think you claimed we need to. I don't think it's necessary to load all the data in RAM.
The way I understand it, linking is essentially concatenating object files (or rather the sections contained in them) into a final executable. But while doing so, relocations must be applied, so object code must not be written before all symbols referenced by that code are known. This can be done in a granular fashion, based on fixed-size blocks. You collect all relocations in per-block lists (unsorted, think std::vector but it can be optimized). When a symbol or code location becomes known (i.e. looking at the library that has the required symbol), that information gets appended to the list. When a list is full (i.e. all relocations known) that block can be committed to the file while applying all fixups. Applying fixups is still random writes but only to fixed size blocks.
That way the problem is almost reduced to streaming fixed size blocks in practice. Typically, all symbols that some object file depends on are typically resolved by the object files that were already added before. So most new chunks that we're looking at can immediately be fixed up and be streamed to the output file. (In fact some linkers are kind of strict about the order of objects that you pass on the command line). What does grow more-or-less linearly though is the set of accumulated symbols (e.g., name + offset) as we add more and more objects files to the output executable.
I don't know a lot about LTO but it seems just like an extension of the same problem. Balancing memory consumption and link times with output performance is partly the user's job. Most projects should probably just disable LTO even for release builds. But making LTO fast in a linker is probably all about using the right approach, doing things in the right order, grouping work to avoid reads and writes (both RAM and disk).
> And I restate that this is essentially solving a language-inflicted problem, not some interesting performance problem.
Sort of yes, sort of no. Yes in that the main alternative (std::mem::transmute) is unsafe, so if you want to avoid dealing with that (e.g., "I personally like the challenge of doing things without unsafe if I can, especially if I can do so without loss of performance") then this can indeed be described as a language-inflicted problem. No in that this is arguably not an "inherent" language-inflicted problem, and there is work towards ways for the Rust compiler to guarantee that certain transmutes will work as expected in safe code, and I want to say that this would be one of those cases.