Hacker News

Amazing talk. Here's a quick writeup if you don't want to watch the full hour or don't have enough hardware knowledge to follow what Markus is talking about, as he goes very fast, in some cases too fast to even let you read the text on his slides. It's mandatory to use the pause key to understand the full details even if you have a deep understanding of every relevant technology, of which he explains none.

The Xbox uses a very advanced variant of the same technologies that also exist on smartphones, tablets and Secure Boot enabled PCs. When fully operational the Xbox security system prevents any unsigned code from running, keeps all code encrypted, proves to remote servers (Xbox Live) that it's a genuine device running in a secure state, and on this base you can build strong anti-piracy checks and block cheating.

The Xbox has several processors and what follows applies to the Platform Security Processor. When a computer starts up (any computer), the CPU begins execution in a state in which basically nothing works, including external communication and even RAM. Executions starts at a 'reset vector' mapped to a boot ROM i.e. the bytes are hard-wired into the silicon itself and can't be changed. The boot ROM then executes instructions to progressively enable more and more hardware, including things like activating RAM. Until that point the whole CPU executes out of its cache lines and can't use more memory than exists on-die.

Getting to the state where the Xbox can achieve all its security goals thus requires it to boot through a series of chained steps which incrementally bring the hardware online, and each step must verify the integrity of the next. The boot ROM is only 19kb of code and a few more kb of data, and can't do much beyond just activating RAM, the memory mapping unit (called MPU on the Xbox), and reading some more code out of writeable flash RAM. The code it reads from flash RAM is the second stage bootloader where much more work gets done, but from this second stage on it can be patched remotely by Microsoft. So if bugs are found there or in any later stage, it hardly matters because MS can issue a software update and detect remotely on Xbox Live servers if that upgrade was applied, so kicking out cheaters and pirates. The second stage boot loader in turn loads more code from disk, signature checks and decrypts it, sets up lots of software security schemes like hypervisors and so on, all the way up to the OS and the games.

Therefore to break Xbox security permanently you have to attack the boot ROM, because that's the only part that can't be changed via a software update. It's the keys to the kingdom and this is what Markus attacked. Attacking the boot ROM is very, very hard. The Xbox team were highly competent:

• Normally the bringup code would be written by the CPU or BIOS vendors but MS wrote it all in house themselves from scratch.

• The code isn't public and has never leaked. To obtain it, someone had to decode it visually by looking at the chip under a scanning electron microscope and map the atomic pictures to bits and then to bytes.

• Having the code barely helps because there are no bugs in it whatsoever.

So, the only way to manipulate it is to actually screw with the internals of the CPU itself by "glitching", meaning tampering with the power supply to the chip at exactly the right moment to corrupt the state of the internal electronics. Glitching a processor has semi-random effects and you don't control what happens exactly, but sometimes you can get lucky and the CPU will skip instructions. By creating a device that reboots the machine over and over again, glitching each time, you can wait until one of those attempts gets lucky and makes a tiny mistake in the execution process.

Glitching attacks predate the Xbox and were mostly used on smartcards until the Xbox 360, which was successfully attacked this way. So Microsoft knew all about them and added many mitigations, beyond "just" writing bug free code:

1. The boot ROM is full of randomized loops that do nothing but which are designed to make it hard to know where in the program the CPU has got to. Glitching requires near perfect timing and this makes it harder.

2. They hardware-disabled the usual status readouts that can be used to know where the program got up to and debug the boot process.

3. They hash-chain execution to catch cases where steps were skipped, even though that's impossible according to program logic.

4. They effectively use a little 'kernel' and run parts of the boot sequence as 'user mode' programs, so that if sensitive parts of the code are glitched they are limited in how badly they can tamper with the boot process.

And apparently there are even more mitigations added post-2013. Markus managed to bypass these by chaining two glitch attacks together, one which skipped past the code that turned on the MMU, which made it possible to break out of one of the the usermode 'processes' (not really a process) and into the 'kernel', and one which then was able to corrupt the CPU state during a memcpy operation, allowing him to take control of the CPU as it was copying the next stage from flash RAM.

If you can take control of the boot ROM execution then you can proceed to decrypt the next stage, skip the signature checks and from there do whatever you want in ways that can't be detected remotely - however, the fact that you're using a 2013 Phat device still can be.

Thanks for this writeup as I haven't had time to review the video yet :)

Considering that the PSP is a small ARM processor that presumably takes up little die space, would it make sense for it to them employ TMR with three units in lockstep to detect these glitches? I really doubt that power supply tampering would cause the exact same effect in all three processors (especially if there are differences in their power circuitry to make this harder) and any disrepancies would be caught by the system.

mysteria 3 hours ago [ - ]

Retr0id 3 hours ago [ - ]

The Nintendo switch 2 uses DCLS (Dual-core lockstep) in the BPMP and PSC (PSC is PSP-like but RISC-V). So yes, it helps - I'm unsure if/where msft uses it on their products.

mysteria 2 hours ago [ - ]

DCLS actually makes sense for this scenario as the fault tolerance gained from having three processors isn't needed here. The system can halt when there's a mismatch, it doesn't have to perform a vote and continue running if 2 of 3 are getting the same result.

Also I just thought of this but it should be possible to design a chip where the second processor runs a couple cycles behind the first one, with all the inputs and outputs stashed in fifos. This would basically make any power glitches affect the two CPUs differently and any disrepancies would be easily detected.

> It's mandatory to use the pause key to understand the full details

I was going to say I disagreed but the rest of your comment reminded me that I've accumulated a lot of domain-specific knowledge.

mike_hearn 2 hours ago [ - ]

What I meant is that at points he skips past slides so quick even very fast readers can't absorb every bullet point. I read at ~2-3x the average speed, have lots of domain knowledge and couldn't read fast enough to get every word on every slide. So the pause key is very useful for that even if you know what's coming.

Retr0id 2 hours ago [ - ]

I read at Normal speed but I didn't feel that way when watching. I believe you though, I was just having an XKCD 2501 moment.

https://xkcd.com/2501/

nerdsniper 4 hours ago [ - ]

Thank you, sincerely. My main question now is, what degree of repeatability has Markus achieved so far?

mike_hearn 4 hours ago [ - ]

On Phat consoles? You could turn it into a modchip, if for some reason you wanted to. It'd be repeatable on every boot but might take a while.

The hard work comes after this though. There are lots of software level mitigations MS could use to keep the old devices usable with Xbox Live if they really wanted to. Just because you can boot anything you want doesn't mean you can't be detected remotely, it just makes it harder for MS to do so reliably. You'd be in a constant game of catch-up.