This article is about a MoE model with only 4B active parameters, it shouldn't take 10 minutes to answer a question about a small project.
I measured a 4bit quant of this model at 1300t/s prefill and ~60t/s decode on Ryzen 395+.
This article is about a MoE model with only 4B active parameters, it shouldn't take 10 minutes to answer a question about a small project.
I measured a 4bit quant of this model at 1300t/s prefill and ~60t/s decode on Ryzen 395+.