The instruction decoder was a large part of the die in 1985. Today you won't be able to identify it in a die photo. In a world with gigantic vector register files, the area used by decode simply is not relevant. Anyway x86 does not save storage space. x86_64 code tends to be larger than armv8 code.
All the various bits that get tacked on for doing prefetch and branch prediction all are fairly large too, given the amount of random caching, which often is what people account for when measuring decode power usage I think. That’s going to be the case in any arch besides something like a DSP without any kind of dynamic dispatch.
I think it's safe to say that a modern x86 branch predictor with its BTBs is significantly larger than the decode block.
Sure, but branch prediction is (as far as we know) a necessary evil. Decode complexity simply isn't.
Right, but decode compexity doesn't matter because of the giant BTB and such. At least that's what I understand.
For the cores working hardest to achieve the absolute lowest cpi running user code, this is true. But these days the computers have computers in them to manage the system. And these kinds of statements aren’t necessarily true for these “inner cores” that aren’t user accessible.
“ RTKit: Apple's proprietary real-time operating system. Most of the accelerators (AGX, ANE, AOP, DCP, AVE, PMP) run RTKit on an internal processor. The string "RTKSTACKRTKSTACK" is characteristic of a firmware containing RTKit.”
https://asahilinux.org/docs/project/glossary/#r
And those cores do not run x86.
I was pretty surprised to find out that the weird non-architectural cores in a Core or Xeon really do run x86 code.
You say its irrelevant, but that's not the same as being necessary. These decode components are simply not necessary whereas a branch prediction actually makes the processor faster
The high-end CPU designs, be it ARMv7, AArch64, RISC-V or x86(-64), have parallelized pre-decoding hardware and buffers for decoded microinstruction because it too, apparently, speeds the execution. From what I understand, the differences in those subsystems that are due to the ISA baroqueness are, again, minuscule.
I've been reading up on this. The differences are indeed minimal. Still not zero, but not the explainer for why M series macs outperform intel x86 on power consumption https://chipsandcheese.com/p/why-x86-doesnt-need-to-die