Many cost relationships from TFA have already been more or less true for the 32-bit CPUs launched after 1990 and they all became true for the 32-bit high-end CPUs launched after 2000 (like Intel Pentium 4 and AMD Athlon XP), when the difference between the CPU clock frequency and the DRAM latency became almost as high as today.
Only for the 32-bit CPUs used in microcontrollers, which may have clock frequencies under 100 MHz and which may lack a cache hierarchy, the cost differences between many kinds of operations may collapse.
For instance even for not too old 32-bit CPUs it is right to classify the instructions in the following groups, based on their cost in clock cycles:
1. Simple integer operations with operands in registers
2. Loads from the L1 cache memory and simple floating-point operations, like addition and multiplication
3. Loads from the L2 cache memory, division (integer or floating-point), square root and mispredicted branches
4. Loads from the L3 cache memory and atomic read-modify-write operations (like atomic exchange, atomic fetch-and-add, atomic compare-and-swap)
5. Loads from the main memory
This classification matches the chart from TFA.
That's what people don't really understand about CPUs these days. DRAM is stuck on 10nm (and even that was a big effort to move there). The capacitor circuit DRAM uses doesn't work if you reduce the size much more, and so it can't be scaled down, and this is not changing. We're pretty much stuck on memory speed almost regardless of chip advances (at least for the individual chips, but we're already using 8 and 16 and more chips at the same time. Something like for your byte: bit 1 -> chip 1, bit 2 -> chip 2, ... So instantaneous read is not actually reading 8 adjecent memory cells but 1 parallellized read)
I wonder if/when we'll reach the point that it's cheaper to manufacture SRAM (with 6 transistors per bit if I recall correctly) than it is to manufacture DRAM. (with 1 transistor and 1 capacitor per bit)
The transistors get smaller every year. The capacitors, like you say, don't anymore. At some point those 5 extra transistors will be cheaper than the capacitor, unless Moore's Law well and truly bites it.