Hacker News

This is nice and useful because the new GPT-OSS model uses this technique. Kudos to the original authors!

And, as always, the FOSS ecosystem moves quickly, llama.cpp already fully support them! https://github.com/ggml-org/llama.cpp/pull/15157