I think this is asking the wrong question. In many case it would be smarter to implement these algorithms using high-level abstractions and then let the optimizer specialize it again. This works very well also in C:https://godbolt.org/z/bohvffd7r I use it a lot, but I am not aware about a public project similar to Eigen. I definitely convinced this could be done and would be very nice. One downside is that one might want to have more precise control. But even then there are solutions which IMHO are better than template metaprogramming.

That's what Eigen does. You write the high level statement and it does template magic to convert that into an optimized series of BLAS calls, even omitting or combining calls (something impossible to do with just _Generic). CTRE does something similar. The parsing all happens at compile time, so code is only paying the cost of matching (which benefits from all the standard compiler optimizations). There's a platonically ideal compiler somewhere out there that could do both of these jobs too, but compilers are difficult enough and need to run fast enough that implementing every possible optimization in every domain isn't going to happen.

I know what Eigen does. The point I tried to make is that you can let the optimizer specialize the code instead of a template engine and this is much cleaner. If you want to do arbitrary transformations, you can just run a program at compile-time. This is still much nicer than have template code and even more powerful.