The vendor-agnostic GPU approach via KernelAbstractions is great to see. The Vulkan compute path is underrated for this — it runs on AMD, NVIDIA, and Intel without needing ROCm or CUDA, just whatever driver ships with the GPU.
Re: the compilation latency discussion — it's a real tension. JIT gives you expressiveness but kills startup. AOT gives you instant start but limits flexibility. Interesting that most GPU languages went JIT when the GPU itself runs pre-compiled SPIR-V/PTX anyway.