One has to wonder if this due to the global memory shortage. ("Oh - changing our memory allocator to be more efficient will yield $XXM dollar savings over the next year").

Facebook had talks already years ago (10+) - nobody was allowed to share real numbers, but several facebook employed where allowed to share that the company has measured savings from optimizations. Reading between the lines, a 0.1% efficiency improvement to some parts of Facebook would save them $100,000 a month (again real numbers were never publicly shared so there is a range - it can't be less than $20,000), and so they had teams of people whose job it was to find those improvements.

Most of the savings seemed to come from HVAC costs, followed by buying less computers and in turn less data centers. I'm sure these days saving memory is also a big deal but it doesn't seem to have been then.

The above was already the case 10 years ago, so LLMs are at most another factor added on.

I've worked on optimizing systems in that ballpark range, memory is worth saving but it isn't necessarily 1:1 with increasing revenue like CPU is. For CPU we have tables to calculate the infra cost savings (we're not really going to free up the server, more like the system is self balancing so it can run harder with the freed CPU), but for memory as long as we can load in whatever we want to (rec systems or ai models) we're in the clear so the marginal headroom isn't as important. It's more of a side thing that people optimizing CPU also get wins in by chance because the skillsets are similar.

I don't have many regrets about having spent my career in (relatively) tiny companies by comparison, but it sure does sound fun to be on the other side for this kind of thing - the scale where micro-optimizations have macro impact.

In startups I've put more effort into squeezing blood from a stone for far less change; even if the change was proportionally more significant to the business. Sometimes it would be neat to say "something I did saved $X million dollars or saved Y kWh of energy" or whatever.

I've heard of some people getting banned from FB to save memory space? Surely that can't be the case but I swear I've seen something like that

> LLMs are at most another factor added on

At most... Think 10x rather than 0.1x or 1x.

On top of cost, they probably cannot get as much memory as they order in a timely fashion so offsetting that with greater efficiency matters right now.

Yeah, identifying single-digit millions of savings out of profiles is relatively common practice at Meta. It's ~easy to come up with a big number when the impact is scaled across a very large numbers of servers. There is a culture of measuring and documenting these quantified wins.

With the reputation of that company, one can wonder a lot of backstories that are even more depressing than a memory shortage.

Not just shortage, any improvements to LLMs/electricity/servers memory footprint is becoming much more valuable as the time goes. If we can get 10% faster, you can easily get a lead in the LLM race. The incentives to transparently improving performance are tremendous

Oooh maybe finally time for lovingly hand-optimized assembly to come back in fashion! (It probably has in AI workloads or so I daydream)

> changing our memory allocator

they've been using jemalloc (and employing "je") since 2009.