Because to make GPT-5 or Claude better than previous models, you need to do more reasoning which burns a lot more tokens. So, your per-token costs may drop, but you may also need a lot more tokens.

GPT-5 can be configured extensively. Is there any point at which any configuration of GPT-5 that offers ~DeepSeek level performance is more expensive than DeepSeek per token?