Which risc-v implementation is considered fast?

> Which risc-v implementation is considered fast?

SpacemiT K3 is 2010 Macbook performance single-core, 2019 Macbook Air multi-core, and better than M4 Apple Silicon for AI.

So I guess it depends on what you are going to do with it.

M4 is 38 TOPS at INT8 precision whereas SpacemiT K3 is 60 TOPS at INT4 precision so at best they would be equal in "AI" performance but they are not because the rest of the K3 chip is much less capable than M4 (as I would expect).

E.g. M4 total system memory bandwidth is 120GB/s whereas K4 is 51GB/s, single core memory bandwidth is 100-120GB/s vs ~30GB/s. M4 has 10 CPU cores and neural engine with 16 cores whereas K3 has 8 CPU cores and 8 "AI" cores, K3 clock frequency is almost half the clock frequency in M4 etc. etc.

But anyway thanks for sharing, always good to learn about new hardware.

DC-ROMA 2 is on the Rasperry 4 level of performance last I heard

[flagged]

I remember taking down some notes wrt SiFive P870 specs, comparing them to x86_64, and reaching the same conclusion. Narrower core width (4-wide vs 8-wide), lower clock frequency (peaks at 3GHz) and no turbo (?), limited support for vector execution (128-bit vs 512-bit), limited L1 bandwidth (1x 128-bit load/cycle?), limited FP compute (2x 128-bit vs 2x 512-bit), load queue is also inconveniently small with 48 entries (affecting already limited load bandwidth), unclear system memory bandwidth and how it scales wrt the number of cores (L3 contention) although for the latter they seem to use what AMD is doing (exclusive L3 cache per chiplet).

SpacemiT K3 is about the same performance as a Rockchip RK3588. So, 4 years ago?

Except the K3 kills it on AI (60 TOPS).