Thanks for sharing this. I'd love to learn more about micro-architectures and instruction sets - would you have any recommendations for books or sources that would be a good starting place?

My experience is mostly practical really - the trick is to learn how to compile stuff yourself.

If you do a typical: "cmake . && make install" then you will often miss compiler optimisations. There's no standard across different packages so you often have to dig into internals of the build system and look at the options provided and experiment.

Typically if you compile a C/C++/Fortran .cpp/.c/.fXX file by hand, you have to supply arguments to instruct the use of specific instruction sets. -march=native typically means "compile this binary to run with the maximum set of SIMD instrucitons that my current machine supports" but you can get quite granular doing things like "-march=sse4,avx,avx2" for either compatibility reasons or to try out subsets.