Hacker News

Taichi, benchmarked in the article, claims to be able to outperform CUDA at some GPU tasks, although their benchmarks look to be a few years old:

And doesn't account for cuTitle, NVidia's new API infrastructure that supports writing CUDA directly in Python via a JIT that is based on MLIR.