This is more of a VAD/turn detection issue. It's gotten a lot better over the last few years, but it's a hard problem. The extra ~100ms of latency makes a huge difference otherwise, especially when you have use cases that require tool calling that can easily add 500ms+ of latency.

It seems that tool calling shouldn't be 500ms of latency?

If you have tool calling complex enough that it necessitates a higher reasoning level, and you would otherwise have reasoning set to "none", this can easily come out to 500ms.