Hacker News

highfrequency 6 hours ago [ - ]

Can you be more specific about which math results you are talking about? Looks like significant improvement on FrontierMath esp for the Pro model (most inference time compute).

ZeroCool2u 6 hours ago [ - ]

Frontier Math, GPQA Diamond, and Browsecomp are the benchmarks I noticed this on.

csnweb 6 hours ago [ - ]

Are you may be comparing the pro model to the non pro model with thinking? Granted it’s a bit confusing but the pro model is 10 times more expensive and probably much larger as well.

ZeroCool2u 6 hours ago [ - ]

Ah yes, okay that makes more sense!