Is this Triton's reply to NVIDIA's tilus[1]. Tilus is suposed to be lower level (e.g. you have control over registers). NVIDIA really does not want the CUDA ecosystem to move to Triton as Triton also supports AMD and other accelerators. So with Gluon you get access to lower level features and you can stay within Triton ecosystem.

[1] https://github.com/NVIDIA/tilus

There's a lot of pressure on the CUDA ecosystem at this point:

- most of the trillion dollar companies have their own chips with AI features (Apple, Google, MS, Amazon, etc.). Gpus and AI training are among their biggest incentives. They are super motivated to not donate major chunks of their revenue to nvidia.

- Mac users don't generally use nvidia anymore with their mac hardware and the apple's CPUs are a popular platform for doing stuff with AI.

- AMD, Intel and other manufacturers want in on the action

- The Chinese and others are facing export restrictions for Nvidia's GPUs.

- Platforms like mojo (a natively compiled python with some additional language features for AI) and others are getting traction.

- A lot of the popular AI libraries support things other than Nvidia at this point.

This just adds to that. Nvidia might have to open up CUDA to stay relevant. They do have a performance advantage. But forcing people to chose, inevitably leads to plenty of choice being available to users. And the more users choose differently the less relevant CUDA becomes.

It sounds like they share that goal. Gluon is a thing because the Triton team realized over the last few months that Blackwell is a significant departure from the Hopper, and achieving >80% SoL kernels is becoming intractable as the triton middle-end simply can't keep up.

Some more info in this issue: https://github.com/triton-lang/triton/issues/7392

Also it REALLY jams me up that this is a thing, complicating discussions: https://github.com/triton-inference-server/server

Oh! I thought it was that, having jumped straight to comments before article.

it feels like Nvidia has 30 "tile-based DSLs with python-like syntax for ML kernels" that are in the works lol. I think they are very worried about open source and portable alternatives to cuda.

Not at all, they are the ones pushing for vendor agnostic Tensorcore extensions in Vulkan, which would solve some part of the portability issue: https://github.com/jeffbolznv/vk_cooperative_matrix_perf

I believe it’s the other way around; Gluon exposes the primitives Triton was built on top of.

No, gluon was in development before Tilus was announced. Could be a response to Cute DSL though.

[1]: https://docs.nvidia.com/cutlass/media/docs/pythonDSL/cute_ds...