It's cool to see the end result, but I would prefer if the article focused a bit more on how it achieves such solution. For example, how does it dispatch work to the various threads? Fo they sleep when there's no work to do? If so how do you wake them up? How does it handle cases where work is not uniformly distributes between your work items (i.e. some of them are a lot slower to process)? Is that even part of the end goal?
Yes, non-uniform workloads are supported! See `for_n_dynamic`.
The threads “busy-wait” by running an infinite loop in a lower energy state on modern CPUs.
And yes, there are more details in the actual implementation in the repository itself. This section, for example, describes the atomic variables needed to control all of the logic: https://github.com/ashvardanian/fork_union?tab=readme-ov-fil...
> The threads “busy-wait” by running an infinite loop in a lower energy state on modern CPUs.
Doesn't that still use part of the process's timeslots from the OS scheduler's POV?