Hacker News

This was 20+ years ago, so the "sophisticated" baseline wasn't ML or AI.

I was looking into an initial implementation and use of order files for a major platform. Quick recap: C (and similar languages) define that every function must have a unique address, but place no constraints on the relative order of those addresses. Choosing the order in which functions appear in memory can have significant performance impact. For example, suppose that you access 1,000 functions over a run of a program, each of which is 100 bytes in size. If each of those functions is mixed in with the 100,000 functions you don't call, you touch (and have to read from disk) 1000 pages; if they're all directly adjacent, you touch 25 pages. (This is a superficial description -- the thousand "but can't you" and "but also"s in your mind right now are very much the point.)

I went into this with moderately high confidence that runtime analysis was going to be the "best" answer, but figured I'd start by seeing how much of an improvement static analysis could give -- this would provide a lower bound for the possible improvement to motivate more investment in the project, and would give immediate improvements as well.

So, what are all the ways you can use static analysis of a (large!) C code base to figure out order? Well, if you can generate a call graph, you can do depth first or breadth first, both of which have theoretical arguments for them -- or you can factor in the function call size, page size, page read lookahead size, etc, and do a mixture based on chunking to those sizes... and then you can do something like an annealing pass since a 4097 byte sequence is awful and you're better off swapping something out for a slightly-less-optimal-but-single-page sequence, etc.

And to test the tool chain, you might as well do a trivial one. How about we just alphabetize the symbols?

Guess which static approach performed best? Alphabetization, by a large margin. This was entirely due to the fact that (a) the platform in question used symbol name prefixes as namespaces; (b) callers that used part of a namespace tended to use significant chunks of it; and (c) call graph generation across multiple libraries wasn't accurate so some of these patterns from the namespaces weren't visible to other approaches.

The results were amazingly good. I felt amazingly silly.

(Runtime analysis did indeed exceed this performance, significantly.)