Can't Windows/Linux pin background threads to specific cores on Intel too? So that your foreground app isn't slowed down by all the background activity going on? Or there's something else to it that I don't understand. I thought E cores' main advantage is that they use less power which is good for battery life on laptops. But the article makes it sound like main advantage of Apple Silicon is that it splits foreground/background workloads better. Isn't it something that can already be done without a P/E distinction?
One thing that distinguishes macOS here is that the mach kernel has the concept of “vouchers” which helps the scheduler understand logical calls across IPC boundaries. So if you have a high-priority (UserInitiated) process, and it makes an IPC call out to a daemon that is usually a low-priority background daemon, the high-priority process passes a voucher to the low-priority one, which allows the daemon’s ipc handling thread to run high-priority (and thus access P-cores) so long as it’s holding the voucher.
This lets Apple architect things as small, single-responsibility processes, but make their priority dynamic, such that they’re usually low-priority unless a foreground user process is blocked on their work. I’m not sure the Linux kernel has this.
That it actually quite simple and nifty. It reminds me of the 4 priorities RPC requests can have within the Google stack. 0 being if this fails it will result in a big fat error for the user to 3, we don’t care if this fails because we will run the analysis job again in a month or so.
IIRC in macOS you do need to pass the voucher, it isn’t inherited automatically. Linux has no knowledge of it, so first it has to be introduced as a concept and then apps have to start using it.
It’s both.
Multithreading has been more ubiquitous in Mac apps for a long time thanks to Apple having offered mainstream multi-CPU machines very early on (circa 2000), predating even OS X itself, and has made a point of making multithreading easier in its SDK. By contrast multicore machines weren’t common in the Windows/x86 world until around the late 2000s with the boom of Intel’s Core series CPUs, but single core x86 CPUs persisted for several years following and Windows developer culture still hasn’t embraced multithreading as fully as its Mac counterpart has.
This then made it dead simple for Mac developers to adopt task prioritization/QoS. Work was already cleanly split into threads, so it’s just a matter of specifying which are best suited for putting on e-cores and which to keep on P-cores. And overwhelmingly, Mac devs have done that.
So the system scheduler is a good deal more effective than its Windows counterpart because third party devs have given it cues to guide it. The tasks most impactful to the user’s perception of snappiness remain on the P-cores, the E-cores stay busy with auxiliary work and keep the P-cores unblocked and able to sleep more quickly and often.
Windows has SetThreadPriority and SetThreadAffinityMask since at least Windows XP.
Yeah, this my guess as well. The other OSes have the ability to pin to specific cores, but first party Apple leaned hard into coding to that hardware vision. Since Apple would love to merge the desktop and mobile software, being very deliberate about what is background vs foreground work is essential. Windows and Linux have not had the hardware guarantees of differentiating between cores, so few programs have taken the effort to be explicit about how the work is executed.
When I ran Gnome, I was regularly annoyed at how often an indexing service would chew through CPU.
There was an article by Raymond Chen where he argued that giving app developers an API option to say "run me under high/low priority" rarely works because every developer views their program as the main character on the stage and couldn't care less about other programs' performance, and they are incentivized to enable the "high priority" option if given a chance because it makes their program run better (at the expense of other programs). So unless there's a strict audit on some kind of app store or some API rules which enforce developers don't abuse the priority API, sometimes it's better to let the OS decide all the scheduling dynamically as the programs run (say, a foreground UI window automatically is given a high priority by the OS), so that the scheduling was fair.
The way it’s conceptualized on Apple platforms is primarily user-initiated vs. program initiated, with the former getting priority. It’s positioned as being about tasks within a program competing for resources rather than programs competing with each other.
So for example, if in an email client the user has initiated the export of a mailbox, that is given utmost priority while things like indexing and periodic fetches get put on the back burner.
This works because even a selfish developer wants their program to run well, which setting all tasks as high priority actively and often visibly impedes, and so they push less essential work to the background.
It just happens that in this case, smart threading on the per-process level makes life for the system scheduler easier.
It's the combination of the two that yields the best of both worlds.
Android SoCs have adopted heterogenous CPU architectures ("big.LITTLE" in the ARM sphere) years before Apple, and as a result, there have been multiple attempts to tackle this in Linux. The latest, upstream, and perhaps the most widely deployed way of efficiently using such processors involves using Energy-Aware Scheduling [1]. This allows the kernel to differentiate between performant and efficient cores, and schedule work accordingly, avoiding situations in which brief workloads are put on P cores and the demanding ones start hogging E cores. Thanks to this, P cores can also be put to sleep when their extra power is not needed, saving power.
One advantage macOS still has over Linux is that its kernel can tell performance-critical and background workloads apart without taking guesses. This is beneficial on all sorts of systems, but particularly shines on those heterogenous ones, allowing unimportant workloads to always occupy E cores, and freeing P cores for loads that would benefit from them, or simply letting them sleep for longer. Apple solved this problem by defining a standard interface for the user-space to communicate such information down [2]. As far as I'm aware, Linux currently lacks an equivalent [3].
Technically, your application can still pin its threads to individual cores, but to know which core is which, it would have to parse information internal to the scheduler. I haven't seen any Linux application that does this.
[1] https://www.kernel.org/doc/html/latest/scheduler/sched-energ...
[2] https://developer.apple.com/library/archive/documentation/Pe...
[3] https://github.com/swiftlang/swift-corelibs-libdispatch?tab=...
> As far as I'm aware, Linux currently lacks an equivalent
SCHED_BATCH and SCHED_IDLE scheduling policies. They've been there since forever.
Similarly, are there any modern benchmarks of the performance impact of pinning programs to a core in Linux? Are we talking <1% or something actually notable for a CPU bound program?
I have read there are some potential security benefits if you were to keep your most exploitable programs (eg web browser) on its own dedicated core.
It's very heavily dependent on what your processes are doing. I've seen extreme cases where the gains of pinning were large (well over 2x when cooperative tasks were pinned to the same core), but thats primarily about preventing the CPU from idling long enough to enter deeper idle states.
Pinning exists, but the interesting part is signal quality: macOS gets consistent “urgency” signals (QoS) from a lot of frameworks/apps, so scheduling on heterogeneous cores is less guessy than infer from runtime behavior.
Yes, it's the job of the scheduler
Linux yes, of course.