> OpenAI made a huge mistake neglecting fast inferencing models.
It's a lost battle. It'll always be cheaper to use an open source model hosted by others like together/fireworks/deepinfra/etc.
I've been maining Mistral lately for low latency stuff and the price-quality is hard to beat.
I'll try benchmarking mistral against my eval, I've been impressed by kimi's importance but it's too slow to do anything useful realtime.