Which, lets be honest, is probably still being orchestrated by Python somewhere.
Python is 9.75 million times faster than Python.
Which, lets be honest, is probably still being orchestrated by Python somewhere.
Python is 9.75 million times faster than Python.
I was researching if there was much benefit to using Rust or C++ over Python for AI, and turns out, the GPU doesn't care once the instructions are in because its an entirely different spec running on the GPU. The only thing you might save on is "startup" costs of getting your code into the GPU I guess? I assume that time cost is miniscule though, once its all in memory, nobody cares that you spent any time "booting it up" any more than how long Windows takes these days.
As long as you don't keep calling out to the CPU, that is.
Tool calling, searches, cache movement if used, and even debug steps all stall the GPU waiting for the CPU.
There was a test of turning one of the under 1B Qwen3+ models into a kernel that didn't stall by the CPU as one GPU pass that saw quite a bit f perf lift over vLLM, I believe, showing this is an issue still.
Its been a month, so I don't remember more details than this.
you can port anything python is doing with a couple prompts into rust/c++, including parity validation. when the barrier to migrating is that thin, you are losing money and time even continuing to talk about it. python is miserably slow, so dont let it touch any part of your system. no snakes in the house.
Pytorch dataloaders are often horribly inefficient, a lot of stuff there can benefit from Rust/C++