The display controller and render device are completely distinct logical devices, even though they are often grouped in a "GPU". On mobile architectures they are quite far separated, leading to annoying problems surrounding what we on Linux call "split drm devices".

Updating plane properties such as to move the cursor plane around or disable it would by itself not block on render activities, as they are completely distinct blocks.

The render hardware could be powered down, but I doubt powering it up and compositing the cursor would take long enough to complete to cause any noticable lag.

Under the Linux APIs, updates to the display controller are done through KMS atomic commits, and one mistake you could do display-server side would be to provide a fence in this atomic commit that the scheduler will use to wait on long-running GPU work before using the provided graphics buffers. Under this API, none of the changes - including mouse movements - would then be applied until that fence is signalled. Changing plane associations can lead to resource reallocations that can be a bit heavy.

Not sure if the kernel driver in macOS works anything remotely similar to this, and the driver could also just be dumb and block on unrelated things ("let's just wait another vblank to see this apply....", "as we only need one plane now let's power down hardware and wait for that to settle..."). It could also just be windowserver that waits for work to finish on its own, not providing any cursor updates in the meantime.

The reality is that it will take reverse engineering or looking at actual code to know what's going on.

Since this is but an iPhone crammed into a laptop, could this behaviour stem from the fact that iPhones generally need not render a cursor?

No, the cursor just uses an overlay plane, and mobile architectures usually have far more planes (sometimes even an arbitrarily configurable amount), and more flexible hardware compositioning overall than desktop GPUs for efficiency reasons.

EDIT: Also note that there is nothing new with the Neo here, as all Macs since the M1 have used the same chip architecture as the iPhone.

Desktop GPU designs did not focus on tiny efficiency gains, and often only has a primary plane, a single overlay plane (for e.g., a video), and a dedicated cursor plane. Some even have to share a single overlay plane between all connected displays. It's a recent thing for desktop GPUs to get more flexible in this area, in part to improve laptop battery life in the cases where the laptop is almost entirely idle.

(For those unaware, a "plane" here is the entity in the display controller you configure to show a rendered graphics buffer, in a particular location and with particular transforms. You commonly have one plane that just covers the whole screen, and then sometimes put dynamic content on top in other planes so you can avoid having to redraw the main buffer when smaller bits of it change, like a video player or cursor. You could also e.g., scroll by rendering an entire document in advance and then move the plane around to reveal parts of it.)

> Desktop GPU designs did not focus on tiny efficiency gains

I'm not sure they're all that tiny if you can squeeze out 70% of top end performance for 25% of the power draw :)

I think the implication is they focussed on the huge efficiency gains, and didn't focus on the small ones?