I think thread pools are one of those solved problems that the silent majority of C programmers has solved ages ago and doesn’t release open source projects for. I’ve also written my 300 line for pool and allocator together and always laughed at taskflow, rayon, etc.. Even NUMA is easy with arena allocation. When casey muratori of the handmade network said the same thing I remember agreeing and he got made fun of for it.
BTW the for-case can simply be supported by setting a pool/global boolean and using that to decide how to wait for a new task (during the paralle for the boolean will be true, otherwise do sleeps with mutexes in the worst case for energy saving)
I'd love to learn more about this. What resources/books/articles/code can I look at to understand this more? Or, if you have some time, would you mind expanding on it?
The parts I'm specifically interested in: 1. What the 300 line pool and allocator look like 2. What this means: "BTW the for-case can simply be supported by setting a pool/global boolean and using that to decide how to wait for a new task (during the paralle for the boolean will be true, otherwise do sleeps with mutexes in the worst case for energy saving)"
Thank you!
This stuff is sometimes difficult to search for because people don’t name it or there are many different names.
Arena allocation on windows for example is basically calling VirtualAlloc for a couple gigabytes on a 64 bit system (you have terabytes of virtual memory available) and then slicing it up into sub-ranges that you pass as parameters to threads grouped hierarchically for each cpu and then within that each group of cores that share cache and then single cores for their own cache memory. Lock the software threads to their hardware threads and done. Then for each arena use bump and maybe pool allocators for most stuff. Very basic and little code, much higher performance than most software out there. It’s also why a lot of diehard C programmers find rust lifetime management overengineered and boring btw because you don’t have so many lifetimes as modern C++ code for example.
For the boolean stuff look at the “better software conference” youtube talk about that video game physics engine for example (sorry, I’m on my phone on the jump). Again, old ideas being rediscovered
I totally agree — most C/C++ developers with 10+ years of experience have built similar thread pools or allocators in their own codebases. I’d include myself and a few of my former colleagues on that list.
That said, closed-source solutions for local use aren’t quite the same as an open-source project with wider validation. With more third-party usage, reviews, and edge cases, you often discover issues you’d never hit in-house. Some of the most valuable improvements I’ve seen have come from external bug reports or occasional PRs from people using the code in very different environments.
There are many open-source and academic libraries for parallel programming which have performance similar or better than OpenMP.
> When casey muratori of the handmade network said the same thing I remember agreeing and he got made fun of for it.
Casey Muratori, while a great programmer in his own right, often disregards the use cases, which leads to apple-to-orange comparisons. E.g., why is this ASCII editor from 40 years ago much faster than this Unicode (with full ZWC joining emoji suite) text editor?
Are you alluding to the microsoft terminal fiasco? It was the other way around: his terminal supported more text features.