re comments:

yes of course this is apples to oranges but that's kind of the point

it shows the vast span between specialized hardware throughput IFF you can use an A100 at its limit vs overhead of one of the most popular programming languages in use today that eventually does the "same thing" on a CPU

the interesting thing is why that is so

CPU vs GPU (latency vs throughput), boxing vs dense representation, interpreter overhead, scalar execution, layers upon layers, …

A100 FP32 throughput “at its limit”: 19.5 TFLOP/s.

AMD EPYC 9965 FP32 throughput “at its limit”: 41.2 TFLOP/s (192 cores x 64 FP32 FLOP/cycle/core x 3.35GHz).

That's also a CPU that came out four years later than the A100. The contemporaneous B200 is not optimized for FP32 and does 74.45 TFLOP/s. For FP16 it's at ~2 PFLOP/s.

The point is that modern CPUs are not as slow as most DL people think. Roughly 10x slower but with a lot more memory.

EPYC 9965: 614GBps of 12-channel DDR5-6400

A100: 1935GBps of HBM2e

Most of those FLOPS are constrained by memory bandwidth.

> Most of those FLOPS are constrained by memory bandwidth

I believe inference with large enough batch size is almost always compute bound, simply due to algorithmic complexity.

Each step of tiled matric multiplication with square tiles of size N^2 takes O(N^2) memory loads and O(N^3) compute operations. With N = 32 or 64, you will likely saturate compute even on iGPUs with DDR4 or DDR5 memory pretending to be VRAM.

[deleted]

A100: 312 TFLOP/s for FP16

but it is very impressive how far modern CPUs get as well (also in smart phones!)

Intel Xeon 6980P: 128 cores x 1024 FP16 FLOP/cycle/core x 3.2 GHz: 419 TFLOP/s

I'm not saying "GPU more brrt than CPU"

I found the comparison interesting

on Intel Xeon 690P with 419 TFLOP/s it is still (maybe even more?) interesting to ask:

how much throughput can you reach with Python, Python with lib x, y, z, with C++ like this, with C++ like that etc etc and why?

no?

No one in their right mind would use pure Python to do matrix multiplication. It’s like using a screwdriver to hammer nails into wood.

But this discussion is even more bizarre than comparing a screwdriver to a hammer, it’s like comparing a screwdriver to a nail.