The quality of the benchmark code is... not great. This seems like Zig written by someone who doesn't know Zig or asked Claude to write it for them. Hell, actually Claude might do a better job here.
In short, I wouldn't trust these results for anything concrete. If you're evaluating which language is a better fit for your problem, craft your own benchmark tailored for that problem instead.
Modern c# has many low level knobs (still in a safe way; though it also supports unsafe) for zero allocation, hardware intrinsics, devirtualization of calls at runtime, etc.: simd (vector), memory spans, stackalloc, source generators (helps with very efficient json), etc.
Most of all: C# has a very nice framework and tooling (Rider).
It's not really surprising given the implementations. The C# stdlib just exposes more low-level levers here (quick look, correct me if I'm wrong):
For one, the C# code is explicitly using SIMD (System.Numerics.Vector) to process blocks, whereas Go is doing it scalar. It also uses a read-only FrozenDictionary which is heavily optimized for fast lookups compared to a standard map.
Parallel.For effectively maps to OS threads, avoiding the Go scheduler's overhead (like preemption every few ms) which is small but still unnecessary for pure number crunching. But a bigger bottleneck is probably synchronization: The Go version writes to a channel in every iteration. Even buffered, that implies internal locking/mutex contention. C# is just writing to pre-allocated memory indices on unrelated disjoint chunks, so there's no synchronization at all.
If you're referring to the SIMD aspect (I assume the other points don't apply here): It depends on your perspective.
You could say yes, because the C# benchmark code is utilizing vector extensions on the CPU while Go's isn't. But you could also say no: Both are running on the same hardware (CPU and RAM). C# is simply using that hardware more efficiently here because the capabilities are exposed via the standard library. There is no magic trick involved. Even cheap consumer CPUs have had vector units for decades.
C# is great, but look at the implementations. The jvm is set up wrong, so JAVA could perform better than what is benchmarked. Hell with Python you'd probably use Celery or numpy or ctypes to do this much faster.
The quality of the benchmark code is... not great. This seems like Zig written by someone who doesn't know Zig or asked Claude to write it for them. Hell, actually Claude might do a better job here.
In short, I wouldn't trust these results for anything concrete. If you're evaluating which language is a better fit for your problem, craft your own benchmark tailored for that problem instead.
So far, the best benchmark seems to be the https://plummerssoftwarellc.github.io/PrimeView/
Although it is very single-thread biased test.
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
Modern c# has many low level knobs (still in a safe way; though it also supports unsafe) for zero allocation, hardware intrinsics, devirtualization of calls at runtime, etc.: simd (vector), memory spans, stackalloc, source generators (helps with very efficient json), etc.
Most of all: C# has a very nice framework and tooling (Rider).
Go is beaten constantly by C# in both Benchmark Game and Techempower benchmarks.
I don't know why this is downvoted, because the statement is not wrong (https://benchmarksgame-team.pages.debian.net/benchmarksgame/...). Times have changed, modern .NET is very fast and is getting faster still (https://devblogs.microsoft.com/dotnet/performance-improvemen...).
It's not really surprising given the implementations. The C# stdlib just exposes more low-level levers here (quick look, correct me if I'm wrong):
For one, the C# code is explicitly using SIMD (System.Numerics.Vector) to process blocks, whereas Go is doing it scalar. It also uses a read-only FrozenDictionary which is heavily optimized for fast lookups compared to a standard map. Parallel.For effectively maps to OS threads, avoiding the Go scheduler's overhead (like preemption every few ms) which is small but still unnecessary for pure number crunching. But a bigger bottleneck is probably synchronization: The Go version writes to a channel in every iteration. Even buffered, that implies internal locking/mutex contention. C# is just writing to pre-allocated memory indices on unrelated disjoint chunks, so there's no synchronization at all.
In other words the benchmark doesn't even use the same hardware for each run?
If you're referring to the SIMD aspect (I assume the other points don't apply here): It depends on your perspective.
You could say yes, because the C# benchmark code is utilizing vector extensions on the CPU while Go's isn't. But you could also say no: Both are running on the same hardware (CPU and RAM). C# is simply using that hardware more efficiently here because the capabilities are exposed via the standard library. There is no magic trick involved. Even cheap consumer CPUs have had vector units for decades.
C# is great, but look at the implementations. The jvm is set up wrong, so JAVA could perform better than what is benchmarked. Hell with Python you'd probably use Celery or numpy or ctypes to do this much faster.
So overall the benchmarks are kind of useless.
Zig's being compiled in "releasesafe" so lots of bounds checking going on.