There's a lot of good work here and I don't want to minimise the issue in any way but: unless the Windows ACPI stack is implemented in an extremely fucked up way, I'd be surprised if some of the technical conclusions here are accurate. (Proviso: my experience here is pretty much all Linux, with the bits that aren't still being the ACPI-CA stack that's used by basically every OS other than Windows. Windows could be bizarre here, but I'd be surprised if it diverged to a huge degree)
AML is an interpreted language. Interrupts need to be handled quickly, because while a CPU is handling an interrupt it can't handle any further interrupts. There's an obvious and immediate conflict there, and the way this is handled on every other OS is that upon receipt of an ACPI interrupt, the work is dispatched to something that can be scheduled rather than handled directly in the interrupt handler. ACPI events are not intended to be performance critical. They're also not supposed to be serialised as such - you should be able to have several ACPI events in flight at once, and the language even includes mutex support to prevent them stepping on each other[1]. Importantly, "Sleep()" is intended to be "Wake me when at least this much time has passed" event, not a "Spin the CPU until this much time has passed" event. Calling Sleep() should let the CPU go off and handle other ACPI events or, well, anything else. So there's a legitimate discussion to be had about whether this is a sensible implementation or not, but in itself the Sleep() stuff is absolutely not the cause of all the latency.
What's causing these events in the first place? I thought I'd be able to work this out because the low GPE numbers are generally assigned to fixed hardware functions, but my back's been turned on this for about a decade and Intel's gone and made GPE 2 the "Software GPE" bit. What triggers a software GPE? Fucked if I can figure it out - it's not described in the chipset docs. Based on everything that's happening here it seems like it could be any number of things, the handler touches a lot of stuff.
But ok we have something that's executing a bunch of code. Is that in itself sufficient to explain video and audio drops? No. All of this is being run on CPU 0, and this is a multi-core laptop. If CPU 0 is busy, do it all on other cores. The problem here is that all cores are suddenly not executing the user code, and the most likely explanation for that is System Management Mode.
SMM is a CPU mode present in basically all Intel CPUs since the 386SL back in 1989 or so. Code accesses a specific IO port, the CPU stops executing the OS, and instead starts executing firmware-supplied code in a memory area the OS can't touch. The ACPI decompilation only includes the DSDT (the main ACPI table) and not any of the SSDTs (additional ACPI tables that typically contain code for additional components such as GPU-specific methods), so I can't look for sure, but what I suspect is happening here is that one of the _PS0 or _PS3 methods is triggering into SMM and the entire system[2] is halting while that code is run, which would explain why the latency is introduced at the system level rather than it just being "CPU 0 isn't doing stuff".
And, well, the root cause here is probably correctly identified, which is that the _L02 event keeps firing and when it does it's triggering a notification to the GPU driver that is then calling an ACPI method that generates latency. The rest of the conclusions are just not important in comparison. Sleep() is not an unreasonable thing to use in an ACPI method, it's unclear whether clearing the event bits is enough to immediately trigger another event, it's unclear whether sending events that trigger the _PS0/_PS3 dance makes sense under any circumstances here rather than worrying about the MUX state. There's not enough public information to really understand why _L02 is firing, nor what is trying to be achieved by powering up the GPU, calling _DOS, and then powering it down again.
[1] This is absolutely necessary for some hardware - we hit issues back in 2005 where an HP laptop just wouldn't work if you couldn't handle multiple ACPI events at once
[2] Why the entire system? SMM is able to access various bits of hardware that the OS isn't able to, and figuring out which core is trying to touch hardware is not an easy thing to work out, so there's a single "We are in SMM" bit and all cores are pushed into SMM and stop executing OS code before access is permitted, avoiding the case where going into SMM on one CPU would let OS code on another CPU access the forbidden hardware. This is all fucking ludicrous but here we are.
Thanks for the detailed insight, i've added your comment verbatim to the report so it doesn't get buried. i'd plainly over-weighted on the GPE path it self and missed that this could also be happening completely. I've taken a bit of a deeper look as a result to both the SSDT/DSDT files and i saw that under _SB.TPM there's a SystemIO OpRegion at TSMI(SMIA,1) with an 8-bit SMI field and several _DSM case writes (SMI assert via the SMI command port im assuming), there's also an EXBU scratchpad with FSMI,ACPF,ALPR ... and so on which i assume is some kind of mailbox used bby the SMM/EC plumbling. Beyond that, I don’t see anything that would immediately stand out in the PEG* scopes themselves, would love to know your thoughts on this, here's the full SSDT/DSDT decompiled code:
https://limewire.com/d/SoEOd#hEZzJ4PWAr
If I were to decompile the SSDTs on an affected laptop, would you be able to see if/when the system is entering SMM and what it's doing in there that's taking so long? I've noticed this issue on several Windows gaming laptops with nvidia GPUs over the years and I also started installing event tracing, the intel BIOS tools, etc, but it was all just a bit beyond my depth. I've done some DJ'ing and would like to use my gaming laptop, but the whole system freezing so hard that audio glitches every minute is probably part of why the "nobody uses Windows for live audio due to latency spikes" thing is so widespread. Not to mention it makes, you know, gaming unpleasant.
Likely able to identify that it's entering SMM, unfortunately not able to demonstrate what it's doing inside there.
Would it be practical to patch out any SMM calls and get applications to run more smoothly?
It's not super easy under Windows and it would likely disable dGPU power management. Easier to patch the DSDT not to send those notifications (although that's still not super easy under Windows)