I was looking at matmul algorithms and their hardware implementations (to start a hardware startup) and I saw that the naive O(n^3) version is what everyone uses.

https://leetarxiv.substack.com/p/why-compilers-rarely-use-st...