I think it's safe to say that a modern x86 branch predictor with its BTBs is significantly larger than the decode block.

Sure, but branch prediction is (as far as we know) a necessary evil. Decode complexity simply isn't.

Right, but decode compexity doesn't matter because of the giant BTB and such. At least that's what I understand.

For the cores working hardest to achieve the absolute lowest cpi running user code, this is true. But these days the computers have computers in them to manage the system. And these kinds of statements aren’t necessarily true for these “inner cores” that aren’t user accessible.

“ RTKit: Apple's proprietary real-time operating system. Most of the accelerators (AGX, ANE, AOP, DCP, AVE, PMP) run RTKit on an internal processor. The string "RTKSTACKRTKSTACK" is characteristic of a firmware containing RTKit.”

https://asahilinux.org/docs/project/glossary/#r

And those cores do not run x86.

I was pretty surprised to find out that the weird non-architectural cores in a Core or Xeon really do run x86 code.