Hacker News

Btw, as someone who agrees with your point, what’s the actual answer to this?

Of these, some are mostly obsolete: GPT-4 and GPT-4 Turbo are worse than GPT-4o in both speed and capabilities. o1 is worse than o3-mini-high in most aspects.

Then, some are not available yet: o3 and o4-mini. GPT-4.1 I haven't played with enough to give you my opinion on.

Among the rest, it depends on what you're looking for:

Multi-modal: GPT-4o > everything else

Reasoning: o1-pro > o3-mini-high > o3-mini

Speed: GPT-4o > o3-mini > o3-mini-high > o1-pro

(My personal favorite is o3-mini-high for most things, as it has a good tradeoff between speed and reasoning. Although I use 4o for simpler queries.)

Y_Y 6 days ago [ - ]

So where was o1-pro in the comparisons in OpenAI's article? I just don't trust any of these first party benchmarks any more.

umanwizard 6 days ago [ - ]

Is 4.5 not strictly better than 4o?

minimaxir 6 days ago [ - ]

It depends on how you define "capability" since that's different for reasoning and nonreasoning models.

henlobenlo 6 days ago [ - ]

Whats the problem, for the layman it doesnt actually matter, and for the experts, its usually very obvious which model to use.

DiscourseFan 6 days ago [ - ]

LLMs fundamentally have the same contraints no matter how much juice you give them or how much you toy with the models.

umanwizard 6 days ago [ - ]

That’s not true. I’m a layman and 4.5 is obviously better than 4o for me, definitely enough to matter.

henlobenlo 6 days ago [ - ]

You are definitely not a layman if you know the difference between 4.5 and 4o. The average user thinks ai = openai = chatgpt.

umanwizard 6 days ago [ - ]

Well, okay, but I'm certainly not an expert who knows the fine differences between all the models available on chat.com. So I'm somewhere between your definition of "layman" and your definition of "expert" (as are, I suspect, most people on this forum).

henlobenlo 6 days ago [ - ]

If you know the difference between 4.5 and 4o, it'll take you 20 minutes max to figure out the theoretical differences between the other models, which is not bad for a highly technical emerging field.