It just fundamentally does not make sense to compare languages by comparing codegen backends. GCC and LLVM do not produce the same code for equivalent code, especially when optimizations are applied. It's an apples-to-oranges comparison.
Using Clang instead of GCC, the comparison becomes slightly better, at least for microbenchmarks that don't rely too much on libraries.
These benchmarks are still useful from a practical viewpoint - answering the question "what's the expected performance bracket of using language X in real projects today". But it doesn't say anything fundamental about the language design or even the quality of the implementation.