> Python cuTile JIT compiler allows writing CUDA kernels in straight Python.
It is currently not straight Python and will never be.
All these "Performance friendly" python dialects (Tryton, Pythran, CuTile, Numba, Pycell, cuPy, ...) appears like Python but are nothing like Python as soon as you scratch the surface.
They are DSL with a python-looking syntax but made to be optimized, typed and inferred properly. And it feels like it when you use it: in each of them, there is many (most?) python features you simply can not use while you still suffer of inherent python issues.
Lets not lie to ourself: Python is inherently bad for efficiency and performance.
And that goes way beyond the GIL: dynamic typing, reference semantics, monkey patching, ultra-dynamic object model, CPython ABI, BigInt by default, runtime module system, ... are all technical choices that makes sense for a small scripting language but terribly sucks for HPC and efficiency.
The entire Numpy/scipy ecosystem itself is already just a hack around Python limitations for simple CPU bound tensor arithmetics. Mainly because builtin python performance sucks so much that a simple for loop would make Excel looks like a race horse.
Mojo is different.
Mojo tries to start from a clean sheet instead of hacking the existing crap.
And tries to provide a "Python like experience" but on top of a well designed language constructed over past language design experience (Python is >30y old)
And just for that, I wish them success.
> All these "Performance friendly" python dialects (Tryton, Pythran, CuTile, Numba, Pycell, cuPy, ...) appears like Python but are nothing like Python as soon as you scratch the surface.
Which is the whole point. No, Python has properties that make it bad for massive, fast number twiddling. However, it’s exceptionally nice for doing all the command line parsing and file loading and setup and other wrapping tasks required to run those pipelines.
Fortran’s fantastic at math stuff. I’d sure had to have to write all the related non-math stuff in it.
And yes, Python’s slower than other languages. But in production, most Python code spends a huge chunk of its time waiting for other code to execute. It takes more CPU for Python to parse an HTTP request or load data files than an AOT language would take, but it’s as efficient sitting there twiddling its thumbs waiting for a DB query or numeric library to finish.
I love when dialects for C and C++ count as being proper C and C++, are even argued as being more relevant than ISO standards by themselves, but anyone else that does the same, it is no longer the same language.
As for Python not being the ideal, there we agree, but the solutions with proper performance already exist, Lisp, Scheme, Julia, Futhark,...
Heck maybe someone could dig out StarLisp.
> I love when dialects for C and C++ count as being proper C and C++, are even argued as being more relevant than ISO standards by themselves
I did not argue about CUDA being proper C++ :)
I honestly believe that the best days of C++ as an accelerator language are behind.
That is the main problem currently: We do miss a modern language for system programming that play well with accelerators. C++ is not (really) one of them (Hello aliasing).
I do not know if Mojo will succeed there, but I wish them good luck.