Hacker News

It's being explored right now for speculative decoding in the local-LLM space, which I think is quite interesting as a use-case

DFlash immediately came to my mind.

There are several Mac implementations of it that show > 2x faster Qwen3.5 already.