> We plan to deliver improvements to [..] purging mechanisms
During my time at Facebook, I maintained a bunch of kernel patches to improve jemalloc purging mechanisms. It wasn't popular in the kernel or the security community, but it was more efficient on benchmarks for sure.
Many programs run multiple threads, allocate in one and free in the other. Jemalloc's primary mechanism used to be: madvise the page back to the kernel and then have it allocate it in another thread's pool.
One problem: this involves zero'ing memory, which has an impact on cache locality and over all app performance. It's completely unnecessary if the page is being recirculated within the same security domain.
The problem was getting everyone to agree on what that security domain is, even if the mechanism was opt-in.
I recently started using Microsoft's mimalloc (via an LD_PRELOAD) to better use huge (1 GB) pages in a memory intensive program. The performance gains are significant (around 20%). It feels rather strange using an open source MS library for performance on my Linux system.
There needs to be more competition in the malloc space. Between various huge page sizes and transparent huge pages, there are a lot of gains to be had over what you get from a default GNU libc.
We evaluated a few allocators for some of our Linux apps and found (modern) tcmalloc to consistently win in time and space. Our applications are primarily written in Rust and the allocators were linked in statically (except for glibc). Unfortunately I didn't capture much context on the allocation patterns. I think in general the apps allocate and deallocate at a higher rate than most Rust apps (or more than I'd like at least).
Our results from July 2025:
rows are <allocator>: <RSS>, <time spent for allocator operations>
app1:
glibc: 215,580 KB, 133 ms
mimalloc 2.1.7: 144,092 KB, 91 ms
mimalloc 2.2.4: 173,240 KB, 280 ms
tcmalloc: 138,496 KB, 96 ms
jemalloc: 147,408 KB, 92 ms
app2, bench1
glibc: 1,165,000 KB, 1.4 s
mimalloc 2.1.7: 1,072,000 KB, 5.1 s
mimalloc 2.2.4:
tcmalloc: 1,023,000 KB, 530 ms
app2, bench2
glibc: 1,190,224 KB, 1.5 s
mimalloc 2.1.7: 1,128,328 KB, 5.3 s
mimalloc 2.2.4: 1,657,600 KB, 3.7 s
tcmalloc: 1,045,968 KB, 640 ms
jemalloc: 1,210,000 KB, 1.1 s
app3
glibc: 284,616 KB, 440 ms
mimalloc 2.1.7: 246,216 KB, 250 ms
mimalloc 2.2.4: 325,184 KB, 290 ms
tcmalloc: 178,688 KB, 200 ms
jemalloc: 264,688 KB, 230 ms
tcmalloc was from github.com/google/tcmalloc/tree/24b3f29.
I’m surprised (unless they replaced the core tcmalloc algorithm but kept the name).
tcmalloc (thread caching malloc) assumes memory allocations have good thread locality. This is often a double win (less false sharing of cache lines, and most allocations hit thread-local data structures in the allocator).
Multithreaded async systems destroy that locality, so it constantly has to run through the exception case: A allocated a buffer, went async, the request wakes up on thread B, which frees the buffer, and has to synchronize with A to give it back.
modern tcmalloc uses per CPU caches via rseq [0]. We use async rust with multithreaded tokio executors (sometimes multiple in the same application). so relatively high thread counts.
If you go into Dr Dobbs, The C/C++ User's Journal and BYTE digital archives, there will be ads of companies whose product was basically special cased memory allocator.
Even toolchains like Turbo Pascal for MS-DOS, had an API to customise the memory allocator.
I remember in the early days of web services, using the apache portable runtime, specifically memory pools.
If you got a web request, you could allocate a memory pool for it, then you would do all your memory allocations from that pool. And when your web request ended - either cleanly or with a hundred different kinds of errors, you could just free the entire pool.
it was nice and made an impression on me.
I think the lowly malloc probably has lots of interesting ways growing and changing.
One of the best parts about GC languages is they tend to have much more efficient allocation/freeing because the cost is much more lumped together so it shows up better in a profile.
Agreed, however there is also a reason why the best ones also pack multiple GC algorithms, like in Java and .NET, because one approach doesn't fit all workloads.
Java has a quite strict max heap setting, it's very uncommon to let it allocate up to 25% of the system memory (the default). It won't grow past that point, though.
Baring bugs/native leaks - Java has a very predictable memory allocation.
When it works. Many programs in GC language end up fighting the GC by allocating a large buffer and managing it by hand anyway because when performance counts you can't have allocation time in there at all. (you see this in C all the time as well)
That's generally a bad idea. Not always, but generally.
It was a better idea when Java had the old mark and sweep collector. However, with the generational collectors (which are all Java collectors now. except for epsilon) it's more problematic. Reusing buffers and objects in those buffers will pretty much guarantees that buffer ends up in oldgen. That means to clear it out, the VM has to do more expensive collections.
The actual allocation time for most of Java's collectors is almost 0, it's a capacity check and a pointer bump in most circumstances. Giving the JVM more memory will generally solve issues with memory pressure and GC times. That's (generally) a better solution to performance problems vs doing the large buffer.
Now, that said, there certainly have been times where allocation pressure is a major problem and removing the allocation is the solution. In particular, I've found boxing to often be a major cause of performance problems.
In many cases you can also do better than using malloc e.g. if you know you need a huge page, map a huge page directly with mmap
Yes, if you want to use huge pages with arbitrary alloc/free, then use a third-party malloc. If your alloc/free patterns are not arbitrary, you can do even better. We treat malloc as a magic black box but it's actually not very good.
I've been using jemalloc for over 10 years and don't really see a need for it to be updated. It always holds up in benchmarks against any new flavor of the month malloc that comes out.
Last time I checked mimalloc which was admittedly a while ago, probably 5 years, it was noticebly worse and I saw a lot of people on their github issues agreeing with me so I just never looked at it again.
Benchmarks age fast. Treating a ten-year-old allocator as done just because it still wins old tests is tempting fate, since distros, glibc, kernel VM behavior, and high-core alloc patterns keep moving and the failures usually show up as weird regressions in production, not as a clean loss on someone's benchmark chart.
I've benchmarked them every few years, they never seem to differ by more than a few percent, and jemalloc seems to fragment and leak the least for processes running for months.
Mimalloc made the claim that they were the fastest/best when they released and that didn't hold up to real world testing, so I am not inclined to trust it now.
> Mimalloc made the claim that they were the fastest/best when they released and that didn't hold up to real world testing
That’s… ahistorical, at least so far as I remember. It wasn’t marketed as either of those; it was marketed as small/simple/consistent with an opt-in high-severity mode, and then its performance bore out as a result of the first set of target features/design goals. It was mainly pushed as easy to adopt, easy to use, easy to statically link, etc.
I feel like the real thing that needs to change is we need a more expressive allocation interface than just malloc/realloc. I'm sure that memory allocators could do a significantly better job if they had more information about what the program was intending to do.
You can also play tricks with inlining and constant propagation in C (especially on the malloc path, where the ground-truth allocation size is usually statically known).
Just out of curiosity are you getting 1GB huge pages on Xeon or some other platform? I always thought this class of page is the hardest to exploit, considering that the machine only has, if I recall correctly, one TLB slot for those.
Modern x86_64 has supported multiple page sizes for a long time. I'm on commodity Zen 5 hardware (9900X) with 128 GiB of RAM. Linux will still use a base page size of 4kb but also supports both 2 MiB and 1 GiB huge pages. You can pass something like `default_hugepagesz=2M hugepagesz=1G hugepages=16` to your kernel on boot to use 2 MiB pages but reserve 16 1 GiB pages for later use.
The nice thing about mimalloc is that there are a ton of configurable knobs available via env vars. I'm able to hand those 16 1 GiB pages to the program at launch via `MIMALLOC_RESERVE_HUGE_OS_PAGES=16`.
EDIT: after re-reading your comment a few times, I apologize if you already knew this (which it sounds like you did).
Right but on Intel the 1G page size has historically been the odd one. For example Skylake-X has 1536 L2 shared TLB entries for either 4K or 2M pages, but it only has 16 entries that can be used for 1G pages. It wasn't unified until Cascade Lake. But Skylake-like Xeon is still incredibly common in the cloud so it's hard to target the later ones.
If there is so much performance difference among generic allocators, it means you need semantic optimized allocators (unless performance is actually not that much important in the end).
One has to wonder if this due to the global memory shortage. ("Oh - changing our memory allocator to be more efficient will yield $XXM dollar savings over the next year").
Facebook had talks already years ago (10+) - nobody was allowed to share real numbers, but several facebook employed where allowed to share that the company has measured savings from optimizations. Reading between the lines, a 0.1% efficiency improvement to some parts of Facebook would save them $100,000 a month (again real numbers were never publicly shared so there is a range - it can't be less than $20,000), and so they had teams of people whose job it was to find those improvements.
Most of the savings seemed to come from HVAC costs, followed by buying less computers and in turn less data centers. I'm sure these days saving memory is also a big deal but it doesn't seem to have been then.
The above was already the case 10 years ago, so LLMs are at most another factor added on.
Yeah, identifying single-digit millions of savings out of profiles is relatively common practice at Meta. It's ~easy to come up with a big number when the impact is scaled across a very large numbers of servers. There is a culture of measuring and documenting these quantified wins.
On top of cost, they probably cannot get as much memory as they order in a timely fashion so offsetting that with greater efficiency matters right now.
Not just shortage, any improvements to LLMs/electricity/servers memory footprint is becoming much more valuable as the time goes. If we can get 10% faster, you can easily get a lead in the LLM race. The incentives to transparently improving performance are tremendous
> I knew from hard experience with Darwin that internally siloed open source projects cannot thrive (HHVM was a repeat lesson)
I'm glad HHVM happened, and also glad it stalled. I don't think PHP 7 and 8 would've made the improvements they did without HHVM kicking their ass, and I think there would've been a fork based on HHVM rather than PHP 8 if HHVM hadn't lost that public momentum.
I remember Wikimedia's testing/partial implementation of HHVM[1] being a turning point, at least in the circles I was in at the time. It showed PHP performance could actually be improved, and by quite a lot. Without that proof of concept at that scale _in the open_, HHVM devs could've ran benchmarks from here to eternity and people still would've said, "yeah, sure, _if you're Facebook_"
I remember I was a senior lead softeng of a worldbank funded startup project, and have deployed Ruby with jemalloc in prod. There's a huge noticeable speed and memory efficiency. It did saved us a lot of AWS costs, compare to just using normal Ruby. This was 8 years ago, why haven't projects adopt it yet as de facto.
> Facebook's coding AIs to the rescue, maybe? I wonder how good all these "agentic" AIs are at dreaded refactoring jobs like these.
No.
This is something you shouldn't allow coding agents anywhere near, unless you have expert-level understanding required to maintain the project like the previous authors have done without an AI for years.
If you filter the commits to the past five years, four of the top six committers are Meta employees. The other two might be as well, it just doesn't say that on their Github / personal website.
First impressions: LOL, the blunt commentary in the HN thread title compared to the PR-speak of the fb.com post.
Second thoughts: Actually the fb.com post is more transparent than I'd have predicted. Not bad at all. Of course it helps that they're delivering good news!
Few months back, some of the services switched to jemalloc for the Java VM. It took months (of memory dumps and tracing sys-calls) to blame the JVM, itself, for getting killed by the oom_killer.
Initially the idea was diagnostics, instead the the problem disappeared on its own.
Meta never abandoned jemalloc. https://github.com/facebook/jemalloc remained public the entire time. It's my understanding that Jason Evans, the creator of jemalloc, had ownership over the jemalloc/jemalloc repo which is why that one stopped being updated after he left.
Meta still maintained it and actively pushed commits to it fixing bugs and adding improvements. From this blog post it sounds like they are increasing investment into it along with resurrecting the original repo. When the repo was archived Meta said that development on jemalloc would be focused towards Meta's own goals and needs as opposed to the larger ecosystem.
This looks a lot as if the facebook/jemalloc repo inserted a single commit 70 commits ago and then rebased the changes in the original repo on top. Because the commit SHAs for the changes pulled in change you see this result.
> We plan to deliver improvements to [..] purging mechanisms
During my time at Facebook, I maintained a bunch of kernel patches to improve jemalloc purging mechanisms. It wasn't popular in the kernel or the security community, but it was more efficient on benchmarks for sure.
Many programs run multiple threads, allocate in one and free in the other. Jemalloc's primary mechanism used to be: madvise the page back to the kernel and then have it allocate it in another thread's pool.
One problem: this involves zero'ing memory, which has an impact on cache locality and over all app performance. It's completely unnecessary if the page is being recirculated within the same security domain.
The problem was getting everyone to agree on what that security domain is, even if the mechanism was opt-in.
https://marc.info/?l=linux-kernel&m=132691299630179&w=2
I recently started using Microsoft's mimalloc (via an LD_PRELOAD) to better use huge (1 GB) pages in a memory intensive program. The performance gains are significant (around 20%). It feels rather strange using an open source MS library for performance on my Linux system.
There needs to be more competition in the malloc space. Between various huge page sizes and transparent huge pages, there are a lot of gains to be had over what you get from a default GNU libc.
We evaluated a few allocators for some of our Linux apps and found (modern) tcmalloc to consistently win in time and space. Our applications are primarily written in Rust and the allocators were linked in statically (except for glibc). Unfortunately I didn't capture much context on the allocation patterns. I think in general the apps allocate and deallocate at a higher rate than most Rust apps (or more than I'd like at least).
Our results from July 2025:
rows are <allocator>: <RSS>, <time spent for allocator operations>
tcmalloc was from github.com/google/tcmalloc/tree/24b3f29.i don't recall which jemalloc was tested.
I’m surprised (unless they replaced the core tcmalloc algorithm but kept the name).
tcmalloc (thread caching malloc) assumes memory allocations have good thread locality. This is often a double win (less false sharing of cache lines, and most allocations hit thread-local data structures in the allocator).
Multithreaded async systems destroy that locality, so it constantly has to run through the exception case: A allocated a buffer, went async, the request wakes up on thread B, which frees the buffer, and has to synchronize with A to give it back.
Are you using async rust, or sync rust?
modern tcmalloc uses per CPU caches via rseq [0]. We use async rust with multithreaded tokio executors (sometimes multiple in the same application). so relatively high thread counts.
[0]: https://github.com/google/tcmalloc/blob/master/docs/design.m...
That’s a considerable regression for mimalloc between 2.1 and 2.2 – did you track it down or report it upstream?
Edit: I see mimalloc v3 is out – I missed that! That probably moots this discussion altogether.
nope.
If you go into Dr Dobbs, The C/C++ User's Journal and BYTE digital archives, there will be ads of companies whose product was basically special cased memory allocator.
Even toolchains like Turbo Pascal for MS-DOS, had an API to customise the memory allocator.
The one size fits all was never a solution.
I remember in the early days of web services, using the apache portable runtime, specifically memory pools.
If you got a web request, you could allocate a memory pool for it, then you would do all your memory allocations from that pool. And when your web request ended - either cleanly or with a hundred different kinds of errors, you could just free the entire pool.
it was nice and made an impression on me.
I think the lowly malloc probably has lots of interesting ways growing and changing.
One of the best parts about GC languages is they tend to have much more efficient allocation/freeing because the cost is much more lumped together so it shows up better in a profile.
Agreed, however there is also a reason why the best ones also pack multiple GC algorithms, like in Java and .NET, because one approach doesn't fit all workloads.
Then there’s perl, which doesn’t free at all.
Perl frees memory. It uses refcounting, so you need to break heap cycles or it will leak.
(99% of the time, I find this less problematic than Java’s approach, fwiw).
doesn't java also?
I heard that was a common complaint for minecraft
What do you mean - if Java returns memory to the OS? Which one - Java heap of the malloc/free by the JVM?
Java is pretty greedy with the memory it claims. Especially historically it was pretty hard to get the JVM to release memory back to the OS.
To an outsider, that looks like the JVM heap just steadily growing, which is easy to mistake for a memory leak.
> Especially historically it was pretty hard to get the JVM to release memory back to the OS.
This feels like a huge understatement. I still have some PTSD around when I did Java professionally between like 2005 and 2014.
The early part of that was particularly horrible.
Java has a quite strict max heap setting, it's very uncommon to let it allocate up to 25% of the system memory (the default). It won't grow past that point, though.
Baring bugs/native leaks - Java has a very predictable memory allocation.
Freedom is overrated... :P
When it works. Many programs in GC language end up fighting the GC by allocating a large buffer and managing it by hand anyway because when performance counts you can't have allocation time in there at all. (you see this in C all the time as well)
That's generally a bad idea. Not always, but generally.
It was a better idea when Java had the old mark and sweep collector. However, with the generational collectors (which are all Java collectors now. except for epsilon) it's more problematic. Reusing buffers and objects in those buffers will pretty much guarantees that buffer ends up in oldgen. That means to clear it out, the VM has to do more expensive collections.
The actual allocation time for most of Java's collectors is almost 0, it's a capacity check and a pointer bump in most circumstances. Giving the JVM more memory will generally solve issues with memory pressure and GC times. That's (generally) a better solution to performance problems vs doing the large buffer.
Now, that said, there certainly have been times where allocation pressure is a major problem and removing the allocation is the solution. In particular, I've found boxing to often be a major cause of performance problems.
In many cases you can also do better than using malloc e.g. if you know you need a huge page, map a huge page directly with mmap
Yes, if you want to use huge pages with arbitrary alloc/free, then use a third-party malloc. If your alloc/free patterns are not arbitrary, you can do even better. We treat malloc as a magic black box but it's actually not very good.
I've been using jemalloc for over 10 years and don't really see a need for it to be updated. It always holds up in benchmarks against any new flavor of the month malloc that comes out.
Last time I checked mimalloc which was admittedly a while ago, probably 5 years, it was noticebly worse and I saw a lot of people on their github issues agreeing with me so I just never looked at it again.
Mimalloc v3 has just come out (about a month ago) and is a significant improvement over both v2 and v1 (what you likely last tested)
Benchmarks age fast. Treating a ten-year-old allocator as done just because it still wins old tests is tempting fate, since distros, glibc, kernel VM behavior, and high-core alloc patterns keep moving and the failures usually show up as weird regressions in production, not as a clean loss on someone's benchmark chart.
It still beat mimalloc when I checked 4-5 years ago.
You really need to benchmark your workloads, ideally with the "big 3" (jemalloc, tcmalloc, mimalloc). They all have their strengths and weaknesses.
Jemalloc can usually keep the smallest memory footprint, followed by tcmalloc.
Mimalloc can really speed things up sometimes.
As usually, YMMV.
I've benchmarked them every few years, they never seem to differ by more than a few percent, and jemalloc seems to fragment and leak the least for processes running for months.
Mimalloc made the claim that they were the fastest/best when they released and that didn't hold up to real world testing, so I am not inclined to trust it now.
> Mimalloc made the claim that they were the fastest/best when they released and that didn't hold up to real world testing
That’s… ahistorical, at least so far as I remember. It wasn’t marketed as either of those; it was marketed as small/simple/consistent with an opt-in high-severity mode, and then its performance bore out as a result of the first set of target features/design goals. It was mainly pushed as easy to adopt, easy to use, easy to statically link, etc.
I feel like the real thing that needs to change is we need a more expressive allocation interface than just malloc/realloc. I'm sure that memory allocators could do a significantly better job if they had more information about what the program was intending to do.
There are, look no further than jemalloc API surface itself:
https://jemalloc.net/jemalloc.3.html
One thing to call out: sdallocx integrates well with C++'s sized delete semantics: https://isocpp.org/files/papers/n3778.html
You can also play tricks with inlining and constant propagation in C (especially on the malloc path, where the ground-truth allocation size is usually statically known).
I used mimalloc to run zenlisp under OpenBSD as it would clash with the paranoid malloc of base.
Just out of curiosity are you getting 1GB huge pages on Xeon or some other platform? I always thought this class of page is the hardest to exploit, considering that the machine only has, if I recall correctly, one TLB slot for those.
Modern x86_64 has supported multiple page sizes for a long time. I'm on commodity Zen 5 hardware (9900X) with 128 GiB of RAM. Linux will still use a base page size of 4kb but also supports both 2 MiB and 1 GiB huge pages. You can pass something like `default_hugepagesz=2M hugepagesz=1G hugepages=16` to your kernel on boot to use 2 MiB pages but reserve 16 1 GiB pages for later use.
The nice thing about mimalloc is that there are a ton of configurable knobs available via env vars. I'm able to hand those 16 1 GiB pages to the program at launch via `MIMALLOC_RESERVE_HUGE_OS_PAGES=16`.
EDIT: after re-reading your comment a few times, I apologize if you already knew this (which it sounds like you did).
Right but on Intel the 1G page size has historically been the odd one. For example Skylake-X has 1536 L2 shared TLB entries for either 4K or 2M pages, but it only has 16 entries that can be used for 1G pages. It wasn't unified until Cascade Lake. But Skylake-like Xeon is still incredibly common in the cloud so it's hard to target the later ones.
If there is so much performance difference among generic allocators, it means you need semantic optimized allocators (unless performance is actually not that much important in the end).
You are not wrong and this is indeed what zig is trying to push by making all std functions that allocate take a allocator parameter.
One has to wonder if this due to the global memory shortage. ("Oh - changing our memory allocator to be more efficient will yield $XXM dollar savings over the next year").
Facebook had talks already years ago (10+) - nobody was allowed to share real numbers, but several facebook employed where allowed to share that the company has measured savings from optimizations. Reading between the lines, a 0.1% efficiency improvement to some parts of Facebook would save them $100,000 a month (again real numbers were never publicly shared so there is a range - it can't be less than $20,000), and so they had teams of people whose job it was to find those improvements.
Most of the savings seemed to come from HVAC costs, followed by buying less computers and in turn less data centers. I'm sure these days saving memory is also a big deal but it doesn't seem to have been then.
The above was already the case 10 years ago, so LLMs are at most another factor added on.
Oooh maybe finally time for lovingly hand-optimized assembly to come back in fashion! (It probably has in AI workloads or so I daydream)
Yeah, identifying single-digit millions of savings out of profiles is relatively common practice at Meta. It's ~easy to come up with a big number when the impact is scaled across a very large numbers of servers. There is a culture of measuring and documenting these quantified wins.
On top of cost, they probably cannot get as much memory as they order in a timely fashion so offsetting that with greater efficiency matters right now.
Not just shortage, any improvements to LLMs/electricity/servers memory footprint is becoming much more valuable as the time goes. If we can get 10% faster, you can easily get a lead in the LLM race. The incentives to transparently improving performance are tremendous
Related. Others?
Jemalloc Postmortem - https://news.ycombinator.com/item?id=44264958 - June 2025 (233 comments)
Jemalloc Repositories Are Archived - https://news.ycombinator.com/item?id=44161128 - June 2025 (7 comments)
Opening up strong with a gigantic merge of the stuff they've been working on in their own fork: https://github.com/jemalloc/jemalloc/pull/2863
> I knew from hard experience with Darwin that internally siloed open source projects cannot thrive (HHVM was a repeat lesson)
I'm glad HHVM happened, and also glad it stalled. I don't think PHP 7 and 8 would've made the improvements they did without HHVM kicking their ass, and I think there would've been a fork based on HHVM rather than PHP 8 if HHVM hadn't lost that public momentum.
I remember Wikimedia's testing/partial implementation of HHVM[1] being a turning point, at least in the circles I was in at the time. It showed PHP performance could actually be improved, and by quite a lot. Without that proof of concept at that scale _in the open_, HHVM devs could've ran benchmarks from here to eternity and people still would've said, "yeah, sure, _if you're Facebook_"
1: https://techblog.wikimedia.org/2014/12/29/how-we-made-editin...
The URL of this story seems to have changed to a Meta press release. What are you quoting?
I remember I was a senior lead softeng of a worldbank funded startup project, and have deployed Ruby with jemalloc in prod. There's a huge noticeable speed and memory efficiency. It did saved us a lot of AWS costs, compare to just using normal Ruby. This was 8 years ago, why haven't projects adopt it yet as de facto.
I used jemalloc recently for ComfyUI/Wan and it’s literally magic. I’m surprised it doesn’t come that way by default.
Is there a concise timelime/history of this? I thought jemalloc was 100% open source, why is Meta in control of it?
Jason Evans (the creator of jemalloc) recounted the entire thing last year: https://jasone.github.io/2025/06/12/jemalloc-postmortem/
"Were I to reengage, the first step would be at least hundreds of hours of refactoring to pay off accrued technical debt."
Facebook's coding AIs to the rescue, maybe? I wonder how good all these "agentic" AIs are at dreaded refactoring jobs like these.
Refactor doesn't mean just artificial puff-up jobs, it's very likely internal changes and reorganization (hence 100s of hours).
There are not many engineers capable of working on memory allocators, so adding more burden by agentic stuff is unlikely to produce anything of value.
> Facebook's coding AIs to the rescue, maybe? I wonder how good all these "agentic" AIs are at dreaded refactoring jobs like these.
No.
This is something you shouldn't allow coding agents anywhere near, unless you have expert-level understanding required to maintain the project like the previous authors have done without an AI for years.
If you filter the commits to the past five years, four of the top six committers are Meta employees. The other two might be as well, it just doesn't say that on their Github / personal website.
First impressions: LOL, the blunt commentary in the HN thread title compared to the PR-speak of the fb.com post.
Second thoughts: Actually the fb.com post is more transparent than I'd have predicted. Not bad at all. Of course it helps that they're delivering good news!
It’s still quite corporate-y, but other than the way of writing I agree it’s generally quite clear.
Someone should tell Bryan Cantrill, he'd probably be ecstatic...
Few months back, some of the services switched to jemalloc for the Java VM. It took months (of memory dumps and tracing sys-calls) to blame the JVM, itself, for getting killed by the oom_killer.
Initially the idea was diagnostics, instead the the problem disappeared on its own.
>We are committed to continuing to develop jemalloc development
From the Department of Redundancy Department.
How is the original author making out in the new arrangement?
Jason Evans worked for Facebook for almost two decades, starting in 2009 - https://jasone.github.io/2025/06/12/jemalloc-postmortem/
He's doing just fine. If you're looking for a story about a FAANG company not paying engineers well for their work, this isn't it.
Jemalloc is used by android bionic libc library
Doesn't it depend on vendors/customization? Default is https://llvm.org/docs/ScudoHardenedAllocator.html since Android 11 (2020).
Meta never abandoned jemalloc. https://github.com/facebook/jemalloc remained public the entire time. It's my understanding that Jason Evans, the creator of jemalloc, had ownership over the jemalloc/jemalloc repo which is why that one stopped being updated after he left.
The repo's availability isn't related to whether it's still maintained.
Meta still maintained it and actively pushed commits to it fixing bugs and adding improvements. From this blog post it sounds like they are increasing investment into it along with resurrecting the original repo. When the repo was archived Meta said that development on jemalloc would be focused towards Meta's own goals and needs as opposed to the larger ecosystem.
I'm not directly involved enough to dig into the details here, but facebook/jemalloc currently says:
> This branch is 71 commits ahead of and 70 commits behind jemalloc/jemalloc:dev.
It looks like both have been independently updated.
This looks a lot as if the facebook/jemalloc repo inserted a single commit 70 commits ago and then rebased the changes in the original repo on top. Because the commit SHAs for the changes pulled in change you see this result.
The team probably sync'd the two after unarchiving the original.
Seems like they’d want to wait to commit until after the layoffs, right?
I work in the space. This article would not have been published if the team responsible was on the chopping block
It's just one team with like 4 people. They can layoff a lot of staff from Metaverse.
[dead]
[dead]
[dead]
And the Oscar for most mealy-mouthed post of the year goes to…
We generally try to avoid corporate press releases for that reason, but is there a good third-party post to replace it with?
https://hn.algolia.com/?dateRange=all&page=0&prefix=true&sor...