That's about what we've seen as well (even directly from deepseek themselves).

We've been using it for async "heartbeat" processing and sms replies, but it's just too slow for live chat replies (which is a shame, as I'd really love to use it there).

Very capable model, but also very slow.

That isn't what the charts on OpenRouter appear to show but they only seem to go back 1 week (unless I missed something). It should be less than 2 seconds to first token and anywhere from 15 to 50 tps depending on the provider. Admittedly 15 is a bit slow but most look to be closer to 30 or 40 which at least personally I think is fine.

https://openrouter.ai/deepseek/deepseek-v4-pro/performance

have you tried their flash model? pro was too slow for me too but I've found flash to be more than capable and it's faster than Gpt-5.5 at medium.

Actually on my list this week to take a look at putting an intelligence escalation flow MVP together (initial assumption would be that flash is good for 60-80% of my user's workflows, with only the tricky questions needing a more capable model. Whether I can put together a proper detection system is yet to be seen).

biggest issue I've had with flash is that it seems to hit a sort of "dumb o'clock" wall. right around the time Beijing would be going to work, response quality takes a dump on instruction-heavy tasks when context grows beyond ~120k tokens.

responses are still usable, no hallucinations or anything, but it's worth keeping in mind if you rely on detailed instructions or large context windows.