I've worked on optimizing modern slow code. Once you optimize a few bottlenecks it turns out it's very hard to optimize because the rest of the time is spread out over the whole code without any small bottlenecks and it's all written in a slow language with no thought for performance.