Optimising hardware to run existing software is how you sell your hardware.
The amount of performance you can extract from a modern CPU if you really start optimising cache access patterns is astounding
High performance networking is another area like this. High performance NICs still go to great lengths to provide a BSD socket experience to devs. You can still get 80-90% of the performance advantages of kernel bypass without abandoning that model.
> The amount of performance you can extract from a modern CPU if you really start optimising cache access patterns is astounding
I think this was one, and I want to emphasise this, of the main points behind Odin programming language.