> OpenAI made a huge mistake neglecting fast inferencing models.

It's a lost battle. It'll always be cheaper to use an open source model hosted by others like together/fireworks/deepinfra/etc.

I've been maining Mistral lately for low latency stuff and the price-quality is hard to beat.

I'll try benchmarking mistral against my eval, I've been impressed by kimi's importance but it's too slow to do anything useful realtime.