Hacker News

This is an often repeated myth, which is only half true.

On Zen 4 and Zen 4c, for most vector instructions the vector datapaths have the same width as in Intel's best Xeons, i.e. they can do two 512-bit instructions per clock cycle.

The exceptions where AMD has half throughput are the vector load and store instructions from the first level cache memory and the FMUL and FMA instructions, where the most expensive Intel Xeons can do two FMUL/FMA per clock cycle while Zen 4/4c can do only 1 FMUL/FMA + 1 FADD per clock cycle.

So only the link between the L1 cache and the vector registers and also the floating-point multiplier have half-width on Zen 4/4c, while the rest of the datapaths have the same width (2 x 512-bit) on both Zen 4/4c and Intel's Xeons.

The server and desktop variants of Zen 5/5c (and also the laptop Fire Range and Strix Halo CPUs) double the width of all vector datapaths, exceeding the throughput of all past or current Intel CPUs. Only the server CPUs expected to be launched in 2026 by Intel (Diamond Rapids) are likely to be faster than Zen 5, but by then AMD might also launch Zen 6, so it remains to be seen which will be better by the end of 2026.