You don't fork/exec everytime. You fork/exec once, and then use a standard C template for a select or epoll loop for a unix socket, and transport all the data that you need processed fast using that, with bidirectional comms.

Even more so, you can often time prototype in Python with rapid dev, and then when you want performance, you can translate it to pretty much whatever, including C, using LLMs that do a pretty good job. With coding agents, you can set them up to basically run the code side by side against a bunch of inputs and automatically fix stuff. We pretty much did this at our job to translate an internal API backend to a web server written purely in C, that was fully memory safe without any memory bugs.