This is an amazing discovery, article, and fix proposal. Fantastic work, very impressive and also very instructive on how things work on modern PCs and how far you can actually dig to get at stuff that is "supposed" to be hidden.
As someone who has written embedded firmware for many years (not for PCs), I can only dream of an end user being this capable to discover a bug. I want to live in the world where Asus immediately send an e-mail offering some kind of short-term contracting work to fly in and talk to their firmware people for a few days and get $FIVE_FIGURES or something, and leave with an updated laptop running their new production BIOS.
Obviously this bug has gone un-fixed for four years so that is not the world we're in. That makes me sad. :|
Edit: s/fix/fix proposal/.
The technical RCA is fascinating, but im also interested in the business processes RCA.
- this sounds ubiquitous and reproducible. How did this not get fed back through tech support/RMA channels? Was there so little evidence that it wasn't correlateable, or did ASUS look and arrive at an incorrect conclusion, eg batch of bad silicon? Could it be that they had plentiful evidence and were negligent or incompetent?
- it sounds like this is plainly evident when using the machine. What is the QA process? This should not have been possible to miss?
- now that they know, what will they do?
Imo, the ceo calculus here is clear. If you're a luxury good with elastic demand, you fix the issue and fix the perception (two separate things). Multi-year, multifaceted issues like this have the potential to ruin a brand. I've bought ROG in the past, and I'm inclined to never do so again.
EDIT: on further reflection, the firmware bug itself is pretty troubling. the other bugs i get - hardware assumptions were changed, or good code was reused that didnt know or support the gpu mux, i see how those errors comes about. the method sleeping an interrupt... is awful? how did that get reviewed? what is that firmware test suite?
It doesn’t matter that the consumer/gamer laptop is a piece of shit, because all of the competitors are too. Consumer hardware is a volume business, and the actual end-user experience matters very little compared to endorsement deals and marketing strategies.
Every one of the affected ASUS laptops probably got a glowing 5/5 review from the usual suspects, and consumers have little hope of getting a fair deal
this is not wrong. i've had a bunch of ASUS and ASRock stuff that was completely unusable. all had 5 star reviews while tech forums all showed people unhappy, with broken network ports, CPUs blowing up the boards due to 'too much IOPS' and other silly things that you do not expect from running a new device with fully compatible components....
There are vendors which do better generally, or have less aggressive 5-star robots. I got an MSI board now which came at a fraction of the cost of an ASUS board. It has worse specifications, but in all honestly. It works. it does what it says on the box without any grief. -- maybe it was a lucky shipment -- , but I am not going back to ASUS or ASRock. rather have 2 FPS less but a device that stays operation and can do it's basic features...
A classic example of not giving a toss about performance is the _horrible_ integration done for Windows Hello protocol on many platforms. The protocol is really good, yet there are bypasses possible on a lot of devices due to bad/incorrect implementations, completely breaking an for-once-actually-good-thing that MS designed.
buying consumer hardware, especially for gaming, is like a lottery these days, and shops / vendors give a lot of grief often (not always..) declining refunds or blaming bad user practices for clear device defects.
It exacerbated by internet warriors that defend their brand. I guess they are living the "gamer" lifestyle. I have been burned by ASUS monitors: slow boot up, issues with detecting Mac device requiring complete monitor reboot (you'd think DP/HDMI are universal standards no?), thin screen supports causing it to ultimately fail, horrible built in speakers. Just a "meh" product. However I guess im in the minority as usually when I bring up this story someone chimes in saying they've had none of these problems with their ASUS product ¯\_(ツ)_/¯
Ultimately I think this problem will fix itself. ASUS will eventually burn through enough customers that they will have to exit certain segments I guess?
Is this why MacBooks have had weird flickering issues with some monitors since the M1 with no end in sight?
> What is the QA process? This should not have been possible to miss?
Have you used consumer goods [or virtually anything] from the last couple decades? By and large, nobody cares. Look at the timeline here; clearly nobody cares.
the QA process is shipping to customers and sticking their head in the pile of money they amass with a broken piece of junk.
Yeah. ACPI's AML bytecode is sort of a mixed blessing. It allows for reverse engineering and end user analasys/fixing of bugs like this.
It's also just a terrible disaster of a programming environment, with a very large (terrifyingly so, given the limited capability) interpreter that needs to live at the highest privilege level of the kernel.
And it's generally used like a hatchet by system integrators for tricks like this, with pretty much exactly the code quality you'd expect. Almost always the path to writing a Linux driver for some oddball laptop subsystem starts with "throw away the ACPI stuff".
As far as I know there are three ACPI AML stacks, the reference intel one, linux uses this, miscrosoft has one, and those crazy hackers over at the openbsd project decided to make their own.
I think that's right. Though my understanding is that the Windows code is derived from the original Intel one too and has evolved in tandem with the Intel-maintained driver. And... yeah, acpica (drivers/acpica/acpica) is just huge; I checked again and it's at 2.5MB of source code. All for a DSDT table parser and a virtual machine with about the capability of a 6502.
and off by one errors
and off by one errors
This is why I prefer having my software vendor write native code to interface with the machine instead of stupid bodges like acpi
Windows laptops are dead on arrival for me, all windows laptops are physical shovelware
But again, AML disassembly may show that it was bad code, but it's at-least-mostly-working code, and provided in a form that can be disassembled and inspected. Lots and lots of robust Linux drivers have been written based on analysis of garbage ACPI integration.
Mixed blessing, but still more blessing than curse.
As a user and programmer, I can only dream of being this knowledgeable about things. There's a ton of domain knowledge embedded in the article, it's pretty amazing.
I managed to reverse engineer a lot of my laptop's features but hit a wall when it came to this ACPI stuff. I dumped the tables and decompiled the code but all I got was stub code. I wanted to be the guy who wrote the Linux drivers for his own laptop but I just didn't manage it. Massive respect for anyone who can do this.
Yeah the best way to go is to buy Linux preinstalled and supported. Though, as with Windows in this case, that still won't save you if the system integration and firmware teams don't do their job.
sorry, what fix? the linked github page ends in "here is everything, ASUS, please fix it", right?
He knows how to fix it, and he fixed his system. It's not universal, you need a custom patch for each model and fw version.
Yeah, sorry, that was a bit unclear. I just meant that the article went as far as propose rather clearly what is needed to fix the issue ("don't sleep() in an interrupt service routine").
Truly awesome analysis, it's great that Asus spends this effort to quality check their hot garbage.. oh wait..