Then let's look at C++, which in some areas has a higher abstraction level than C, but in some areas can still be faster than C. (Due to usage of templates, which then inline the library code, which then can be optimized on actual types, rather than using library functions which use void pointers, which will require a function call and have a not as optimized compiled form.
The main thing about python being slower is that in most contexts it is used as an interpreted/interpiled language running on its own VM in cpython.
What you wrote about C versus C++ is largely untrue. C++ is not faster than C, even when “using library functions which use void pointers”. There is nothing about a void pointer that prevents inlining in either C or C++. There is also no “optimized on actual types” for C++ and not C, since everything is compiled into a low level intermediate language in the compiler (typically three-address code). All of the C++ types are absent at that point. The low level intermediate representation is then what receives optimization.
For example, GCC will outright inline both bsearch() and the provided comparator in cases where it can see the definition of the comparator, such that there are no function calls done to either bsearch() or the comparator. C compilers will do this for a number of standard library functions and even will do it for non-library functions in the same file since they will inline small functions if allowed to do inlining. When the functions are not in the same file, you need LTO to get the inlining, but the same applies to C++.
That said, I have never seen assembly output from a C++ compiler for C++ using C++ specific language features that was more succinct than equivalent C. I am sure it happens, but the C++ language is just full of bloat. C++ templates usually result in more code than less, since the compiler must work much harder to optimize the result and opportunities are often lost. It is also incredibly easy for overhead to be hiding in the abstractions, especially if you have a thread safe class used in a single threaded context as part of a larger multithreaded program. The compiler will not optimize the thread safety overhead away. You might not believe that C++ language features add bloat, so I will leave you with this tidbit:
https://twitter.com/TimSweeneyEpic/status/122307740466037145...
Tim Sweeney’s team had C++ code that not only did not use exceptions, but explicitly opted out of them with noexcept. They got a 15% performance boost from turning off exceptions in the compiler, for no apparent reason. C, not having exceptions, does not have this phantom overhead.
Yes, a C compiler will do something special for some library functions. However for my library, I can use higher level abstractions to implement my algorithms etc. and benefit from the optimisations well.
And yes, one can write somewhat generic C code, which can be in lines as well, but that's not as high level abstracted code (i.e. not type safe, but around void pointers)
Noexcept doesn't mean "this function doesn't use exceptions", it means "this function doesn't throw exceptions". The difference being that a child function can throw, but std::terminate will be called once a noexcept function is unwound. There's no standard way to specify the former, only compiler flags.
C++ can be used to write code that generates assembly equivalent to pretty much any C. A lot of standards committee work goes into ensuring that's possible. The trade-off is that it's the closest thing humans have ever produced to a lovecraftian programming language.
If every function is marked noexcept, does it make a difference? Either way, the point of C++ exceptions was to make the fast path even faster by moving error handling out of it. Since they had no idea what was wrong, the code evidently was running in the fast path since it was not terminating by throwing exceptions, yet it ran slower merely because of the C++ exception support.
In any case, my point is that C++ features often carry overhead that simply does not exist in C. Being able to get C++ code to run as fast as C code (possibly by not using any C++ features) does not change that.
Yes, but you can not achieve faster code in C++ than you can also achieve in C and the use of templates or dynamic dispatch certainly can come with a cost. I would also argue than you can write similar abstractions also in C with very similar trade-offs. The difference is mostly that C has less syntactic sugar but everything is more obvious.
I'd love to see any examples you have of compile time metaprogramming libraries like Eigen or CTRE written instead in C. You can do a little of that with _Generic, but I'd generally prefer the nightmare that is templates to most of the hardcore macro magic I've encountered (e.g. Boost.PP), let alone constexpr.
I think this is asking the wrong question. In many case it would be smarter to implement these algorithms using high-level abstractions and then let the optimizer specialize it again. This works very well also in C:https://godbolt.org/z/bohvffd7r I use it a lot, but I am not aware about a public project similar to Eigen. I definitely convinced this could be done and would be very nice. One downside is that one might want to have more precise control. But even then there are solutions which IMHO are better than template metaprogramming.
That's what Eigen does. You write the high level statement and it does template magic to convert that into an optimized series of BLAS calls, even omitting or combining calls (something impossible to do with just _Generic). CTRE does something similar. The parsing all happens at compile time, so code is only paying the cost of matching (which benefits from all the standard compiler optimizations). There's a platonically ideal compiler somewhere out there that could do both of these jobs too, but compilers are difficult enough and need to run fast enough that implementing every possible optimization in every domain isn't going to happen.
I know what Eigen does. The point I tried to make is that you can let the optimizer specialize the code instead of a template engine and this is much cleaner. If you want to do arbitrary transformations, you can just run a program at compile-time. This is still much nicer than have template code and even more powerful.
Yet all your descriptions have nothing to do with ISO C, nor ISO C++, rather quality of implementation in the GCC compiler toolchain.
The other guy’s remarks were based on the behavior of ancient compilers. I was describing current ones. In any case, my remarks mostly apply to LLVM too. GCC and LLVM are the only compilers that matter these days.
Intel replaced ICC with a LLVM fork and Microsoft’s compiler is used by only a subset of Windows’ developers. There are few other compilers in widespread use for C and C++. I believe ARM has a compiler, but Linaro has made it such that practically nobody uses it. Pathscale threw in the towel several years ago too. There is the Compcert C compiler, but it is used in only niche applications. I could probably name a few others if I tried, but they are progressively more niche as I continue.