Hacker News

So I think the takeaway here is, this is a super fast companion model to larger models, that reasons quickly. Perhaps this technique can be used to train a highly optimized reasoning "expert" in MoEs.