Hacker News

Lalabadie 6 hours ago [ - ]

Off the top of my head since it seems to be the quick info you're looking for: IIRC, with these two, the 27B is a dense model, meaning it's all active at inference. Meanwhile, the 35B is a Mixture of Experts (MoE), so only part of its network (3B?) is active at any time.

bakies 5 hours ago [ - ]

Thanks! Dense models have been slow on my compute, but I'll give it a try. If its not toooooo slow then it's fine I mostly fire and forget agents anyway.

Edit: seems fast! I'll try it out some more, thanks again.