I think we're talking past each other. I'm not suggesting that most users should be writing individual computational ops themselves in C. They'll certainly be unlikely to match the perf of expert written C that has had time invested in it. The point I'm trying to make is that when use a framework for modern AI you're not just calling an individual op or many individual ops in isolation. It matters how multiple dependent ops are sequenced along with other code, e.g. data loading code (that may be custom application specific, so not available in a library). My argument is that it may be easier to reach peak performance on your hardware if that framework code was all written in a lower level language.

Ah, i see - for production or most real-time systems you'd be right imo but the time taken to complete most tasks in a conventional ai development environment means the overhead from python moving from lib to lib becomes neigible, no?