I saw videos of coding with Mimo-V2.5-Pro UltraSpeed, which is advertised at 1,000 tokens/s, which is very impressive.:
https://www.bilibili.com/video/BV1fME16uEW7
If the time-to-first-token latency also greatly improved, this could be very useful for end-to-end in controls, like autonomous driving for example.
It’s awesome, particularly since it’s at DeepSeek tier prices (3X of DS-V4-Pro). At 1,000 tok/sec though you can really rip through tokens. (About $9 an hour if you manage to run the output nonstop.)
It tends to cost more than DS since it doesn’t seem to have as many input cache hits.