Interesting, i can see this being very similar to Nvidia's CUTE DSL. This hints that we are converging to a (local) optimal design for Python-based DSL kernel programming.
Interesting, i can see this being very similar to Nvidia's CUTE DSL. This hints that we are converging to a (local) optimal design for Python-based DSL kernel programming.