That is why profiling is the only way to good performance. It's what lets you know what matters, and it's the only thing that does or can. I've been doing low-level (as well as high level) programming for more than 25 years, and I don't know in advance what is more efficient than what. An operation that was inefficient in the program I wrote yesterday under high contention or bad branch prediciton could be efficient in the program I'll write tomorrow. I can only know that if I profile my specific program (and when writing code for different architectures, I need to profile my program on all of them, because what's efficient on x86-64 may be inefficient on Aarch64 or vice-versa). The days we could tell that something is efficient or not, except for the obvious cases, are gone. Computers, at both the hardware and software infrastructure layers, don't work like that anymore.

If your profile shows you a hot path that's responsible for 90% of the time your program spends, any second optimising anything outside of it harms your performance, as it's a second spent on low ROI instead of high ROI.