Hacker News

• Flagship GPT-4.1: top‑tier intelligence, full endpoints & premium features

• GPT-4.1-mini: balances performance, speed & cost

• GPT-4.1-nano: prioritizes throughput & low cost with streamlined capabilities

All share a 1 million‑token context window (vs 120–200k on 4o-o3/o1), excelling in instruction following, tool calls & coding.

Benchmarks vs prior models:

• AIME ’24: 48.1% vs 13.1% (~3.7× gain)

• MMLU: 90.2% vs 85.7% (+4.5 pp)

• Video‑MME: 72.0% vs 65.3% (+6.7 pp)

• SWE‑bench Verified: 54.6% vs 33.2% (+21.4 pp)