The top trading firms are firing off orders in double digit nanoseconds, not milliseconds.
In some cases the order leaving the card starts to emerge before the packet containing the market data event that they're responding to has even finished arriving.
Waiting for a full microsecond for the packet to arrive before responding means you're already too slow
The speed game is essentially over
Doesn't the fact that a modern FPGA-centric (probably ASICs in the mix too at this point) hybrid NIC/order-parser/state-machine thing is rumored to be able to hit glass-to-glass of ~20-40ns mean that the speed game is hotter than ever?
Do you mean that because it involves a lot of hardware design now? The days of being able to offer around the inside in C++ on a regulated securities exchange are over, but there's still C++ driving the thing, that 20ns "tick to trade" or however it's being measured in some instance is still pretty basic response stuff, light speed is still a thing. There's a C++ program upstairs running the show, and it's trying to do it's job in under a mike for sure.
The OG talk on this is Carl Cook's: https://www.youtube.com/watch?v=NH1Tta7purM
But there are more recent talks (Optiver is especially transparent about it but other people talk about it too): https://www.youtube.com/watch?v=sX2nF1fW7kI, that's David Gross at CppCon last year, it can't have changed that much since last year.
That’s irrelevant to the fact that the expected PnL on a millisecond of latency improvement is a lot more than 1M in some markets. Obviously if you are getting what ever trade you are concerned with off in less than one millisecond, the question isn’t well posed.
There are many more games to play than delta one takeout and the solutions certainly don’t fit on one or a handful of FPGA’s.