Hacker News

DuckConference 19 hours ago [ - ]

They're big, expensive chips with a focus on power efficiency. AMD and Intel's chips that are on the big and expensive side tend toward being optimized for higher power ranges, so they don't compete well on efficiency, while their more power efficient chips tend toward being optimized for size/cost.

If you're willing to spend a bunch of die area (which directly translates into cost) you can get good numbers on the other two legs of the Power-Performance-Area triangle. The issue is that the market position of Apple's competitors is such that it doesn't make as much sense for them to make such big and expensive chips (particularly CPU cores) in a mobile-friendly power envelope.

aurareturn 18 hours ago [ - ]

Per core, Apple’s Performance cores are no bigger than AMD’s Zen cores. So it’s a myth that they’re only fast and efficient because they are big.

What makes Apple silicon chips big is they bolt on a fast GPU on it. If you include the die of a discrete GPU with an x86 chip, it’d be the same or bigger than M series.

You can look at Intel’s Lunar Lake as an example where it’s physically bigger than an M4 but slower in CPU, GPU, NPU and has way worse efficiency.

Another comparison is AMD Strix Halo. Despite being ~1.5x bigger than the M4 Pro, it has worse efficiency, ST performance, and GPU performance. It does have slightly more MT.

chasil 15 hours ago [ - ]

Is it not true that the instruction decoder is always active on x86, and is quite complex?

Such a decoder is vastly less sophisticated with AArch64.

That is one obvious architectural drawback for power efficiency: a legacy instruction set with variable word length, two FPUs (x87 and SSE), 16-bit compatibility with segmented memory, and hundreds of otherwise unused opcodes.

How much legacy must Apple implement? Non-kernel AArch32 and Thumb2?

Edit: think about it... R4000 was the first 64-bit MIPS in 1991. AMD64 was introduced in 2000.

AArch64 emerged in 2011, and in taking their time, the designers avoided the mistakes made by others.

daeken 14 hours ago [ - ]

There's no AArch32 or Thumb support (A32/T32) on M-series chips. AArch64 (technically A64) is the only supported instruction set. Fun fact: this makes it impossible to run Mario Kart 8 via virtualization on Macs without software translation, since it's A32.

How much that does for efficiency I can't say, but I imagine it helps, especially given just how damn easy it is to decode.

averne_ 12 hours ago [ - ]

It actually doesn't make much difference: https://chipsandcheese.com/i/138977378/decoder-differences-a...

chasil 10 hours ago [ - ]

I had not realized that Apple did not implement any of the 32-bit ARM environment, but that cuts the legs out of this argument in the article:

"In Anandtech’s interview, Jim Keller noted that both x86 and ARM both added features over time as software demands evolved. Both got cleaned up a bit when they went 64-bit, but remain old instruction sets that have seen years of iteration."

I still say that x86 must run two FPUs all the time, and that has to cost some power (AMD must run three - it also has 3dNow).

Intel really couldn't resist adding instructions with each new chip (MMX, PAE for 32-bit, many more on this shorthand list that I don't know), which are now mostly baggage.

theevilsharpie 6 hours ago [ - ]

> I still say that x86 must run two FPUs all the time, and that has to cost some power (AMD must run three - it also has 3dNow).

Legacy floating-point and SIMD instructions exposed by the ISA (and extensions to it) don't have any bearing on how the hardware works internally.

Additionally, AMD processors haven't supported 3DNow! in over a decade -- K10 was the last processor family to support it.

daeken 12 hours ago [ - ]

Oh wow, I need to dig way deeper into this but wonderful resource - thanks!

Fluorescence 13 hours ago [ - ]

> Despite being ~1.5x bigger than the M4 Pro

Where are you getting M4 die sizes from?

It would hardly be surprising given the Max+ 395 has more, and on average, better cores fabbed with 5nm unlike the M4's 3nm. Die size is mostly GPU though.

Looking at some benchmarks:

> slightly more MT.

AMD's multicore passmark score is more than 40% higher.

https://www.cpubenchmark.net/compare/6345vs6403/Apple-M4-Pro...

> worse efficiency

The AMD is an older fab process and does not have P/E cores. What are you measuring?

> worse ST performance

The P/E design choice gives different trade-offs e.g. AMD has much higher average single core perf.

> worse GPU performance

The AMD GPU:

14.8 TFLOPS vs. M4 Pro 9.2 TFLOPS.

19% higher 3D Mark

34% higher GeekBench 6 OpenCL

Although a much crappier Blender score. I wonder what that's about.

https://nanoreview.net/en/gpu-compare/radeon-8060s-vs-apple-...

aurareturn 12 hours ago [ - ]

  Where are you getting M4 die sizes from?

M1 Pro is ~250mm2. M4 Pro likely increased in size a bit. So I estimated 300mm2. There are no official measurements but should be directionally correct.

  AMD's multicore passmark score is more than 40% higher.

It's an out of date benchmark that not even AMD endorses and the industry does not use. Meanwhile, AMD officially endorses Cinebench 2024 and Geekbench. Let's use those.

   The AMD is an older fab process and does not have P/E cores. What are you measuring?

Efficiency. Fab process does not account for the 3.65x efficiency deficit. N4 to N3 is roughly ~20-25% more efficient at the same speed.

  The P/E design choice gives different trade-offs e.g. AMD has much higher average single core perf.

Citation needed. Further more, macOS uses P cores for all the important tasks and E cores for background tasks. I fail to see why even if AMD has a higher average ST would translate to better experience for users.

  14.8 TFLOPS vs. M4 Pro 9.2 TFLOPS.

TFLOPs are not the same between architectures.

  19% higher 3D Mark

Equal in 3DMark Wildlife, loses vs M4 Pro in Blender.

  34% higher GeekBench 6 OpenCL

OpenCL has long been deprecated on macOS. 105727 is the score for Metal, which is supported by macOS. 15% faster for M4 Pro.

The GPUs themselves are roughly equal. However, Strix Halo is still a bigger SoC.

vient 11 hours ago [ - ]

> TFLOPs are not the same between architectures.

Shouldn't they be the same if we are speaking about same precision? For example, [0] shows M4 Max 17 TFLOPS FP32 vs MAX+ 395 29.7 TPLOFS FP32 - not sure what exact operation was measured but at least it should be the same operation. Hard to make definitive statements without access to both machines.

[0] https://www.cpu-monkey.com/en/compare_cpu-apple_m4_max_16_cp...

aurareturn 11 hours ago [ - ]

M4 Max doesn't even disclose TFLOPS so no clue where that website got the numbers from.

TFLOPS can't be measured the same between generations. For example, Nvidia often quotes sparsity TFLOPS which doubles the dense TFLOPS previously reported. I think AMD probably does the same for consumer GPUs.

Another example is Radeon RX Vega 64 which had 12.7 TFLOPS FP32. Yet, Radeon RX 5700 XT with just 9.8 TFLOPS FP32 absolutely destroyed it in gaming.

Fluorescence 12 hours ago [ - ]

What a waste of time.

"directionally correct"... so you don't know and made up some numbers? Great.

AMD doesn't "endorse benchmarks" especially not fucking Geekbench for multi-core. No-one could because it's famously nonsense for higher core counts. AMD's decade old beef with Sysmark was about pro-Intel bias.

aurareturn 12 hours ago [ - ]

  "directionally correct"... so you don't know and made up some numbers? Great.

I never said it was exactly that size. Apple keeps the sizes of their base, Pro, and Max chips fairly consistent over generations.

Welcome to the world of chip discussions. I've never taken apart and M4 Pro computer and measured the die myself. It appears no one has on the internet. However, we can infer a lot of it based on previously known facts. In this case, we know M1 Pro's die size is around 250mm2.

  AMD doesn't "endorse benchmarks" especially not fucking Geekbench for multi-core. No-one could because it's famously nonsense for higher core counts. AMD's decade old beef with Sysmark was about pro-Intel bias.

Geekbench is the main benchmark AMD tends to use: https://videocardz.com/newz/amd-ryzen-5-7600x-has-already-be...

The reason is because Geekbench correlates highly with SPEC, which is the industry standard.

Fluorescence 10 hours ago [ - ]

Their "main benchmark"? Stop making things up. It's no more than tragic fanboy addled fraud at this point.

That three-year old press-release refers to SINGLE CORE Geekbench and not the defective multicore version that doesn't scale with core counts. Given AMD's main USP is core counts it would be an... unusual choice.

AMD marketing uses every other product under the sun too (no doubt whatever gives the better looking numbers)... including Passmark e.g. it's on this Halo Strix page:

https://www.amd.com/en/products/processors/ai-pc-portfolio-l...

So I guess that means Passmark is "endorsed" by AMD too eh? Neat.

aurareturn 8 hours ago [ - ]

The industry has moved past Passmark because it does not correlate to actual real world performance.

The standard is SPEC, which correlates with with Geekbench.

https://medium.com/silicon-reimagined/performance-delivered-...

Every time there is a discussion on Apple Silicon, some uninformed person always brings up Passmark, which is completely outdated.

Fluorescence 6 hours ago [ - ]

Enough. You don't know what you are talking about.

What's with posting 5 year old medium articles about a different version of Geekbench? Geekbench 5 had different multicore scaling so if you want to argue that version was so great then you are also arguing against Geekbench 6 because they don't even match.

https://www.servethehome.com/a-reminder-that-geekbench-6-is-...

"AMD Ryzen Threadripper 3995WX, a huge 64 core/ 128 thread part, was performing at only 3-4x the rate of an Intel D-1718T quad-core part, even despite the fact it had 16x the core count and lots of other features."

"With the transition from Geekbench 5 to Geekbench 6, the focus of the Primate Labs team shifted to smaller CPUs"

aurareturn 4 hours ago [ - ]

GB6 measures MT the way most consumer applications use MT. GB5 was embarrassingly parallel. It reflects real world usage more.

Hikikomori 9 hours ago [ - ]

Your source is an article based on someone finding a Geekbench result for a just released CPU and you somehow try to say its from AMD itself and its an endorsed benchmark, huh.

aurareturn 9 hours ago [ - ]

Those are AMD's marketing slides.

wordofx 18 hours ago [ - ]

[flagged]