It’s shocking how close this feels to claude, obviously it's much slower, but I don’t know that it’s significantly dumber. Interestingly the imatrix quantization seems to be better than whatever quant the zdr inference backends on open router are using. It was self aware enough yesterday to realize that it’s own server process was itself without me telling it, which is not something I’ve ever observed a local model doing before.

In my (obviously anecdotal) testing, DeepseekV4 Pro was better than Sonnet at coding. However, it is much slower, but also many times cheaper, especially with the promotion right now.

Do they have a coding plan or you only pay per API call?

It’s just per token, but burning up 100 million+ tokens is a $3 transaction with their pricing right now

Do you use the official API or another provider?

Just directly. Paid for it with PayPal. It’s quite simple to set up and use.

I use the official API, OpenRouter somehow didn't use caching and one short session with Qwen cost me $5.

You pay per api call but you will be challenged to burn trough 20$ per month. 24/7 usage for single agent will probably cost you around 100$ per month. It is very efficient especially with modern harnesses.