> DeepSeek-R1-0528-Qwen3-8B https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B ... Released today; probably the best reasoning model in 8B size.

  ... we distilled the chain-of-thought from DeepSeek-R1-0528 to post-train Qwen3-8B Base, obtaining DeepSeek-R1-0528-Qwen3-8B ... on AIME 2024, surpassing Qwen3-8B by +10.0% & matching the performance of Qwen3-235B-thinking.
Wild how effective distillation is turning out to be. No wonder, most shops have begun to "hide" CoT now: https://news.ycombinator.com/item?id=41525201

> Beyond its improved reasoning capabilities, this version also offers a reduced hallucination rate, enhanced support for function calling, and better experience for vibe coding.

Thank you for thinking of the vibe coders.