The only way to beat CUDA like in many other API cases, is by middleware.
Many keep forgeting that CUDA for years is a polyglot platform, C, C++, Fortran, plus anything PTX, some of which also target OpenCL, meaning Haskell, Java, C#, Julia, Futhark, or Python bindings.
Then there are the libraries, and GPGPU graphical debugging tools.
By the way, Modular just announced partnerships with AWS and NVidia for Mojo and related tooling.