But wouldn't the software cursor operations also go in the queue? I don't see the problem.

Modern GPUs usually have multiple command queues, at least one for application use (often separate queues for rendering and compute) and one for OS use. There's a good chance that this wasn't implemented on a chip intended for a phone.

For something as small as a cursor they could be doing direct framebuffer writes.