Super low latency inference might be helpful in applications like quant trading. However, in an era where a frontier model becomes outdated after 6 months, I wonder how useful it can be.

Also, quant trading probably care more about embedding the content instead of generating output tokens