Yes, you will find similar things at essentially all other model providers.
The older/bigger GPT4 runs at $30/$60 and peforms about on par with GPT4o-mini which costs only $0.15/$0.60.
If you are currently, or have been integrating AI models in the past ~2 years, you should definitely keep up with model capability/pricing development. If you are staying on old models you are certainly overpaying/leaving performance on the table. It's essentially a tax on agility.
Just switch out gpt-4o-mini for gpt-4o, the point stands. Across the board, these foundational model companies have comparable, if not more powerful, models that are cheaper than their older models.
OpenAI's own words: "GPT-4o is our most advanced multimodal model that’s faster and cheaper than GPT-4 Turbo with stronger vision capabilities."
I found that gpt-4-turbo beat gpt-4o pretty consistently for coding tasks, but claude-3.5-sonnet beat both of them, so it's what I have been using most of the time. gpt-4o-mini is adequate for summarizing text.
> Yeah, I mean that's why we're both here and why we're discussing this very topic, right? :D
That wasn't specifically directed at "you", but more as a plea to everyone reading that comment ;)
I looked at a few benchmarks, comparing the two, which like in the case of Opus 3 vs Sonnet 3.5 is hard, as the benchmarks the wider community is interested in shifts over time. I think this page[0] provides the best overview I can link to.
Yes, GPT4 is better in the MMLU benchmark, but in all other benchmarks and the LMSys Chatbot Arena scores[1], GPT4o-mini comes out ahead. Overall, the margin between is so thin that it falls under my definition of "on par". I think OpenAI is generally a bit more conservative with the messaging here (which is understandable), and they only advertise a model as "more capable", if one model beats the other one in every benchmark they track, which AFAIK is the case when it comes to 4o mini vs 3.5 Turbo.
So Opus that costs $15.00/$75.00 for 1mil tokens (input/output) is now worse than the model that costs $3.00/$15.00?
That's according to https://docs.anthropic.com/en/docs/about-claude/models which has "claude-3-5-sonnet-20241022" as the latest model (today's date)
Yes, you will find similar things at essentially all other model providers.
The older/bigger GPT4 runs at $30/$60 and peforms about on par with GPT4o-mini which costs only $0.15/$0.60.
If you are currently, or have been integrating AI models in the past ~2 years, you should definitely keep up with model capability/pricing development. If you are staying on old models you are certainly overpaying/leaving performance on the table. It's essentially a tax on agility.
> The older/bigger GPT4 runs at $30/$60 and peforms about on par with GPT4o-mini which costs only $0.15/$0.60.
I don't think GPT-4o Mini has comparable performance to GPT-4 at all, where are you finding the benchmarks claiming this?
Everywhere I look says GPT-4 is more powerful, but GPT-4o Mini is most cost-effective, if you're OK with worse performance.
Even OpenAI themselves about GPT-4o Mini:
> Our affordable and intelligent small model for fast, lightweight tasks. GPT-4o mini is cheaper and more capable than GPT-3.5 Turbo.
If it was "on par" with GPT-4 they would surely say this.
> should definitely keep up with model capability/pricing development
Yeah, I mean that's why we're both here and why we're discussing this very topic, right? :D
Just switch out gpt-4o-mini for gpt-4o, the point stands. Across the board, these foundational model companies have comparable, if not more powerful, models that are cheaper than their older models.
OpenAI's own words: "GPT-4o is our most advanced multimodal model that’s faster and cheaper than GPT-4 Turbo with stronger vision capabilities."
gpt-4o:
$2.50 / 1M input tokens $10.00 / 1M output tokens
gpt-4-turbo:
$10.00 / 1M input tokens $30.00 / 1M output tokens
gpt-4:
$30.00 / 1M input tokens $60.00 / 1M ouput tokens
https://openai.com/api/pricing/
I found that gpt-4-turbo beat gpt-4o pretty consistently for coding tasks, but claude-3.5-sonnet beat both of them, so it's what I have been using most of the time. gpt-4o-mini is adequate for summarizing text.
> Yeah, I mean that's why we're both here and why we're discussing this very topic, right? :D
That wasn't specifically directed at "you", but more as a plea to everyone reading that comment ;)
I looked at a few benchmarks, comparing the two, which like in the case of Opus 3 vs Sonnet 3.5 is hard, as the benchmarks the wider community is interested in shifts over time. I think this page[0] provides the best overview I can link to.
Yes, GPT4 is better in the MMLU benchmark, but in all other benchmarks and the LMSys Chatbot Arena scores[1], GPT4o-mini comes out ahead. Overall, the margin between is so thin that it falls under my definition of "on par". I think OpenAI is generally a bit more conservative with the messaging here (which is understandable), and they only advertise a model as "more capable", if one model beats the other one in every benchmark they track, which AFAIK is the case when it comes to 4o mini vs 3.5 Turbo.
[0]: https://context.ai/compare/gpt-4o-mini/gpt-4
[1]: https://artificialanalysis.ai/models?models_selected=gpt-4o-...