Doesn't the fact that a modern FPGA-centric (probably ASICs in the mix too at this point) hybrid NIC/order-parser/state-machine thing is rumored to be able to hit glass-to-glass of ~20-40ns mean that the speed game is hotter than ever?
Do you mean that because it involves a lot of hardware design now? The days of being able to offer around the inside in C++ on a regulated securities exchange are over, but there's still C++ driving the thing, that 20ns "tick to trade" or however it's being measured in some instance is still pretty basic response stuff, light speed is still a thing. There's a C++ program upstairs running the show, and it's trying to do it's job in under a mike for sure.
The OG talk on this is Carl Cook's: https://www.youtube.com/watch?v=NH1Tta7purM
But there are more recent talks (Optiver is especially transparent about it but other people talk about it too): https://www.youtube.com/watch?v=sX2nF1fW7kI, that's David Gross at CppCon last year, it can't have changed that much since last year.