The estimation for output token is too low since one reasoning-enabled response can burn through thousands of output tokens. Also low for input tokens since in actual use there're many context (memory, agents.md, rules, etc) included nowadays.
The estimation for output token is too low since one reasoning-enabled response can burn through thousands of output tokens. Also low for input tokens since in actual use there're many context (memory, agents.md, rules, etc) included nowadays.
When using APIs, you pay for reasoning tokens like you do for actual outputs. So, the estimation on a per-token basis is not affected by reasoning.
What reasoning affects is the ratio of input to output tokens, and since input tokens are cheaper, that may well affect the economics in the end.
Correct, and with reasoning, the ratio is totally off. As others have pointed out, actual usage is way higher (much more than 3-5x) than the estimation in the article, which is probably for very trivial users.