Distributed compute is cool, but $320 for 13 tokens/s on a tiny input prompt, 4 bit quantization, and 3B active parameter model is very underwhelming
Distributed compute is cool, but $320 for 13 tokens/s on a tiny input prompt, 4 bit quantization, and 3B active parameter model is very underwhelming