GPT-4.1 Pricing (per 1M tokens):
gpt-4.1
- Input: $2.00
- Cached Input: $0.50
- Output: $8.00
gpt-4.1-mini
- Input: $0.40
- Cached Input: $0.10
- Output: $1.60
gpt-4.1-nano
- Input: $0.10
- Cached Input: $0.025
- Output: $0.40
GPT-4.1 Pricing (per 1M tokens):
gpt-4.1
- Input: $2.00
- Cached Input: $0.50
- Output: $8.00
gpt-4.1-mini
- Input: $0.40
- Cached Input: $0.10
- Output: $1.60
gpt-4.1-nano
- Input: $0.10
- Cached Input: $0.025
- Output: $0.40
Awesome, thank you for posting. As someone who regularly uses 4o mini from the API, any guesses or intuitions about the performance of Nano?
I'm not as concerned about nomenclature as other people, which I think is too often reacting to a headline as opposed to the article. But in this case, I'm not sure if I'm supposed to understand nano as categorically different than many in terms of what it means as a variation from a core model.
they share in livestream that 4.1-nano is worse than 4o-mini - so nano is cheaper, faster and have bigger context but worse in intelligence. 4.1mini is smarter but there is price increase.
The fact that they're raising the price for the mini models by 166% is pretty notable.
gpt-4o-mini for comparison:
- Input: $0.15
- Cached Input $0.075
- Output: $0.60
That's what I was thinking. I hoped to see a price drop, but this does not change anything for my use cases.
I was using gpt-4o-mini with batch API, which I recently replaced with mistral-small-latest batch API, which costs $0.10/$0.30 (or $0.05/$0.15 when using the batch API). I may change to 4.1-nano, but I'd have to be overwhelmed by its performance in comparision to mistral.
I don't think they ever committed themselves to uniformed pricing for mini models. Of course cheaper is better but I understand pricing to be contingent on factors specific to every next model rather than following from a blanket policy.
Seems like 4.1 nano ($0.10) is closer to the replacement and 4.1 mini is a new in-between price
The cached input price is notable here: previously with GPT-4o it was 1/2 the cost of raw input, now it's 1/4th.
It's still not as notable as Claude's 1/10th the cost of raw input, but it shows OpenAI's making improvements in this area.
Unless that has changed, anthropics (and gemini) caches are opt-in though if I recall, openai automatically chaches for you.