I'm excited to see what cuTile-rs unlocks. Like the direction of HuggingFace's grout https://github.com/huggingface/grout project for local LLM inference:

- state of the art performance

- codebase that fits in a context window (including kernel definitions!)

- single binary deployment

Similar to antirez's ds4.c, but in Rust and with cuTile making kernels both easier to author and higher performance.

[dead]