RISC vs CISC. Why you think a mainframe is so fast?
ARM is great. Those M are the only thing I could buy used and put Linux on it.
RISC vs CISC. Why you think a mainframe is so fast?
ARM is great. Those M are the only thing I could buy used and put Linux on it.
> RISC vs CISC. Why you think a mainframe is so fast?
This hasn't been true for decades. Mainframes are fast because they have proprietary architectures that are purpose-built for high throughput and redundancy, not because they're RISC. The pre-eminent mainframe architecture these days (z/Architecture) is categorized as CISC.
Processors are insanely complicated these days. Branch prediction, instruction decoding, micro-ops, reordering, speculative execution, cache tiering strategies... I could go on and on but you get the idea. It's no longer as obvious as "RISC -> orthogonal addressing and short instructions -> speed".
> The pre-eminent mainframe architecture these days (z/Architecture) is categorized as CISC.
Very much so. It's largely a register-memory (and indeed memory-memory) rather than load-store architecture, and a direct descendant of the System/360 from 1964.
Everything is RISC after it gets decoded. It isn’t 1990 anymore. The decoder costs maybe 1% performance.
In Haswell, 4.8w out of 22.1w for the core were used for the decoder for integer/ALU instructions[0]. According to this[1] analysis of the entire ubuntu repository, 89% of all instructions were composed of just 12 instructions (all integer/ALU).
From this we can infer that for most normal workloads, almost 22% of the Haswell core power was used in the decoder. As decoders have gotten wider and more complex in recent designs, I see no reason why this wouldn't be just as true for today's CPUs.
[0] https://www.usenix.org/system/files/conference/cooldc16/cool...
[1] https://oscarlab.github.io/papers/instrpop-systor19.pdf
I thought people stopped believing this around 2005 when Apple users finally had to admit that PPC was behind x86.
Even though this was the case for the most part during the entire history of PPC Macs (I owned two during these years)
https://chipsandcheese.com/p/arm-or-x86-isa-doesnt-matter
It especially doesn't matter because the latest x86 update adds a mode that turns it into ARM.
https://www.intel.com/content/www/us/en/developer/articles/t...
RISC lost its meaning once SPARC added an integer multiply instruction.
Cheese and Chips makes some bad arguments in that article.
Their claim that ARM decoders are just as complex wasn't true then and is even less true now. ARM reduced decoder size 75% from A710 to A715 by dropping legacy 32-bit stuff. Considering that x86 is way more complex than 32-bit ARM, the difference between an x86 and ARM decoder implementation is absolutely massive.
They abuse the decoder power paper (and that paper also draws a conclusion its own data doesn't support). The data shows that for integers/ALU, some 22% of total core power is used by the decoder for integer/ALU workloads. As 89% of all instructions in the entire Ubuntu repos are just 12 integer/ALU instructions, we can infer that the power cost of the decoder is significant (I'd consider nearly a quarter of the total power budget to be significant anyway).
The x86 decoder situation has gotten worse with Golden Cove (with 6 decoders) being infamous for its power draw and AMD fearing power draw so much that they opted for a super-complex dual 4-wide decoder setup. If the decoder power didn't matter, they'd be doing 10-wide decoders like the ARM designers.
The claim that ARM uses uops too is somewhere between a red herring and false equivalency. ARM uops are certainly less complex to create (otherwise they'd have kept around the uop cache) and ARM instructions being inherently less complex means that uop encoding is also going to be more simple for a given uarch compared to x86.
They then have an argument that proves too much when they say ARM has bloat too. If bloat doesn't matter, why did ARM make an entirely new ISA that ditches backward compatibility? Why take any risk to their ecosystem if there's no reward?
They also skip over the fact that objectively bad design exists. NOBODY out there defends branch delay slots. They are universally considered an active impediment to high-performance designs with ISAs like MIPS going so far as to create duplicate instructions without branch delay slots in order to speed things up. You can't argue that ISA definitely matters here, but also argue that ISA never makes any difference at all.
The "all ISAs get bloated over time" is sheer ignorance. x86 has roots going back to the early 1970s before we'd figured out computing. All the basics of CPU design are now stable and haven't really changed in 30+ years. x86 has x87 which has 80-bits because IEEE 754 didn't exist yet. Modern ISAs aren't repeating that mistake. x86 having 8 registers isn't a mistake they are going to make. Neither is 15 different 128-bit SIMD extensions or any of the many other bloated mess-ups x86 has made over the last 50+ years. There may be mistakes, but they are almost certainly going to be on fringe things. In the meantime, the core instructions will continue to be superior to x86 forever.
They also fail to address implementation complexity. Some of the weirdness of x86 like tighter memory timing gets dragged through the entire system complicating things. If this results in just 10% higher cost and 10% longer development time, that means a RISC company could develop a chip for $5.4B over 4.5 years instead of $6B over 5 years which represents a massive savings and a much lower opportunity cost while giving a compounding head-start on their x86 competitor that can be used to either hit the market sooner or make even larger performance jumps each generation.
Finally, optimizing something like RISC-V code is inherently easier/faster than optimizing x86 code because there is less weirdness to work around. RISC-V basically just has one way to do something and it'll always be optimized while x86 often has different ways to do the same thing and each has different tradeoffs that make sense in various scenarios.
As to PPC, Apple didn't sell enough laptops to pay for Motorola to put enough money into the designs to stay competitive.
Today, Apple macbooks + phones move nearly 220M chips per year. For comparison, total laptop sales last year were around 260M. If Apple had Motorola make a chip today, Motorola would have the money to build a PPC chip that could compete with and surpass what x86 offers.
Fair enough.
And don’t forget that Apple can do things like completely remove all of the hardware that supports 32 bit code and tell developers to just deal with it.
At least my G5 helped keep my room warm in the winter.
Its fun watching things swing back and forth over time. I remember having those Sun mini-fridge size servers, all running RISC sparc based CPU's if I remember correctly. I wonder if there would be some merit in RISC based linux servers, like maybe the power consumption is lower? I forget the pros/cons of RISC vs CISC CPUs.