Hacker News

nopinsight 6 hours ago [ - ]

I assume you're using the "regular" Pro version of Gemini 3.1 for the above, rather than the Deep Think mode, which is more comparable to GPT-5.5 Pro. To my knowledge, regular 3.1 Pro is a tier below and often makes mistakes.

Moreover, there's no reason to believe the progress of LLMs, which couldn't reliably solve high-school math problems just 3–4 years ago, will stop anytime soon.

You might want to track the progress of these models on the CritPt benchmark, which is built on *unpublished, research-level* physics problems:

https://critpt.com/

Frontier models are still nowhere near solving it, but progress has been rapid.

* o3 (high) <1.5 years ago was at 1.4%

* GPT 5.4 (xhigh), 23.4%

* GPT-5.5 (xhigh), 27.1%

* GPT-5.5 Pro (xhigh) 30.6%.

https://artificialanalysis.ai/evaluations/critpt.

FrojoS 5 hours ago [ - ]

> there's no reason to believe the progress of LLMs [...] will stop anytime soon

Wrong. Every advancement has followed a s curve. Where we are on that curve is anyones guess. Or maybe "this time its different".

CuriouslyC 5 minutes ago [ - ]

What people miss is that AI isn't one S curve, each capability we try to bake into a model has its own S curve. Model progress might not impact some capabilities at all, but other capabilities might get totally overhauled.

gdhkgdhkvff 2 hours ago [ - ]

Great. You see a shape in graphs. And that shape tells you that _at some unknown point in the future_ progress will slow (but likely not stop).

Now back to the point, what reason do you have to believe progress will stop soon? If you have no reason, then it sounds like you agree with OP.

Which makes the patronizing sarcasm all that much more nauseating.

le-mark 44 minutes ago [ - ]

Nausea aside, what evidence does anyone have that “super intelligence” of the sort your argument alludes to is even possible? Because that’s what we’re really talking about; greater than human intelligence on this sort of academic task. For example; When llms start contributing meaningfully to their own development, that would be a convincing indicator imo.

jeremyjh 28 minutes ago [ - ]

This discussion is not about superintelligence, it is about continued progress. Fully general human intelligence at much lower cost than humans is all that is required to profoundly reshape society, but it is not clear even that will happen soon.

As the blog points out - this is one particular subfield where LLMs have much easier prospects - lots of low hanging fruit that “just” requires a couple weeks of PHD candidate research.

Mathematics itself is one of a small handful of endeavors where automated reinforcement training is extremely straightforward and can be done at massive scale without humans.

Neither of these factors place a structural bound on the kind of thing LLMs can be good at, but we are far from certain we can achieve performance at this level in other fields economically and in the near future.

bdangubic 33 minutes ago [ - ]

> When llms start contributing meaningfully to their own development, that would be a convincing indicator imo.

This has been the case for awhile now already…

https://kersai.com/the-48-hours-that-changed-ai-forever-clau...

eiieue 21 minutes ago [ - ]

And yet the world hasn’t changed all that much except people getting laid off in response to over-hiring prior to the diffusion of llm’s.

le-mark 16 minutes ago [ - ]

[flagged]

gchamonlive 2 hours ago [ - ]

This could be right for the current architecture of LLMs, but you can come up with specialized large language models that can more efficiently use tokens for a specific subset of problems by encoding the information differently (https://www.nature.com/articles/d41586-024-03214-7).

So if instead of text we come up with a different representation for mathematical or physical problems, that could both improve the quality of the output while reducing the amount of transformers needed for decoding and encoding IO and for internal reasoning.

There are also difference inference methods, like autoregressive and diffusion, and maybe others we haven't discovered yet.

You combine those variables, along with the internal disposition of layers, parameter size and the actual dataset, and you have such a large search space for different models that no one can reliably tell if LLM performance is going to flatline or continue to improve exponentially.

coldtea 25 minutes ago [ - ]

>This could be right for the current architecture of LLMs, but you can come up with specialized large language models that can more efficiently use tokens for a specific subset of problems by encoding the information differently.

That's precisely what happens on the bad side of a S curve.

vessenes 2 hours ago [ - ]

There are advancements that do not follow s curves - consider for instance total data transmitted over all networks, or financial derivatives volumes.

I think a better question for AI is “is it more like a network effect, liquidity effect, or a biological/physical effect”?

010101010101 2 hours ago [ - ]

Those are measuring the utility of a technological advancement by looking at usage, not the pace of advancement of said technology.

vessenes 12 minutes ago [ - ]

Yes. But quantity has a quality all its own, as they say — derivatives have gone through at least a few step functions where they have become more important and more useful as their usage grows. I’d call that advancement.

Maybe just to be clear I think that kneejerk “I hate this AI trend, and prefer to believe this will end soon, all exponential growth ends eventually” is intellectually lazy, and dangerous for younger engineers/hackers, a group I hope can benefit from being on HN.

Bitcoin mining went through something like 13 10x growth periods, last I ran the numbers a few years ago. There are physical processes that do have very extended periods of doubling, and there are digital and financial processes that don’t show any signs of doing anything but continuing to keep growing over their multidecade lives. So, like I said, it’s worth thinking carefully, and risk mitigation for things like mental health, career decisions and investment decisions indicates we should be cautious assessing new dynamics.

eiieue 19 minutes ago [ - ]

Got him. That guy always posts with so much bluster lmao.

coldtea 24 minutes ago [ - ]

>There are advancements that do not follow s curves - consider for instance total data transmitted over all networks, or financial derivatives volumes

Or Roman trade volume before the Fall of Rome.

Not to mention what you describe is not technological improvement but increase in data or money flows, not the same.

vessenes 19 minutes ago [ - ]

Sic transit gloria - obviously.

But I don’t that think it’s quite so obvious that model quality / growth / usefulness is definitively and obviously not more like data or money flows than it is like some other process.

camdenreslink 32 minutes ago [ - ]

Total volume of usage is not an advancement, it’s orthogonal.

mirmor23 2 hours ago [ - ]

[dead]

aspenmartin 2 hours ago [ - ]

It’s more of a guess if you don’t know about things like scaling laws and RL with verification. The onus of “we’re going to saturate” anytime soon is on that claim because every measurement points to that not being true.

aurareturn 5 hours ago [ - ]

He said "will stop anytime soon". He didn't say forever.

Lionga 4 hours ago [ - ]

Which still makes no sense. There is the same chance we are flatlining now as that we are flatlining in e.g. 3 years or 5 years.

squidbeak 3 hours ago [ - ]

In what sense are the models flatlining?

nicoburns 2 hours ago [ - ]

In the sense that the incremental improvements in capabilities that we've been seeing in recent models seem to taking exponentially growing amounts of compute to achieve.

nl an hour ago [ - ]

But they don't?

Mythos is a 10T model. Opus is a 5T model.

That's not an exponentially growing amount of compute but it is achieving exponential improvements (eg from Mozilla: https://blog.mozilla.org/en/privacy-security/ai-security-zer... )

coldtea 16 minutes ago [ - ]

Compute doesn't necessarily linerarly follow parameters. And with how many active parameters Mythos vs Opus gets its effectivenes from? Is it 1x or 2x? We don't know. We don't even know the parameters (it's more of rumor than confirmed 10T iirc).

But even more so, who said the improvements are "exponential"? Mozilla's single metric, that doesn't even prove anything of the sort?

le-mark 40 minutes ago [ - ]

> but it is achieving exponential improvements

“Exponential” used here is pure hyperbole. Can you justify it?

scotty79 2 hours ago [ - ]

It can be S curve (and it almost surely is), but on every chart you can plot, you don't see even of an inkling of the bend yet.

holoduke an hour ago [ - ]

Software and hardware have no limits. Theoretically would could bozons for computations and have the same amount of computation available on one cm3 of the current total computation in the entire world. Same with software. Never there was a stop on new algorithms. With LLMs there are so many parts that will get better and are not very far fetched.

jeremyjh 38 minutes ago [ - ]

What the fuck does that have to do with “soon”?

Der_Einzige 38 minutes ago [ - ]

This is FUD and extremely wrong. None of the advancements have followed an S curve. This time IS different and it should be obvious to you at this point.

civvv 5 hours ago [ - ]

There are many indications that model progress is slowing down, so that is not entirely accurate.

CuriouslyC 3 minutes ago [ - ]

Model progress at spitting out unhallucinated facts is slowing down hard. Model progress at solving hard math challenges/programming tasks doesn't seem to be slowing down that I can tell.

aspenmartin 2 hours ago [ - ]

Please be specific because outside of anecdotal blog posts by people who don’t know what they’re talking about it’s not true. Look at scaling laws, composite benchmarks from the epoch capability index, nothing at all suggests “model progress is slowing down”

StrauXX 5 hours ago [ - ]

Which indications are that?

nicoburns 2 hours ago [ - ]

The cost factors on the new models compared to the old models.

jeremyjh 21 minutes ago [ - ]

Qwen3.6 9B is as good as GPT-4o and runs on my M2 MacBook Air. Models are getting stronger and less costly at the same time, but these are somewhat separate branches of research. Frontier labs are spending more because they are still getting marginal returns and there is more capacity to spend than there was a year ago.

bdelmas 43 minutes ago [ - ]

You are mixing cost and progress. It’s not because it’s more and more expensive that progress is slowing down by itself.

nicoburns 19 minutes ago [ - ]

They are intrinsically linked beyond a certain point. If we're making progress but costs are spiraling exponentially then it stands to reason that we will soon reach a point where we can no longer afford the increasing costs and thus progress will slow.

(barring some breakthrough that reduces costs, which of course may happen, but for which recent model improvements are not strong evidence of)

4 hours ago [ - ]

[deleted]

overfeed 4 hours ago [ - ]

Investment dollars.

dzhiurgis 3 hours ago [ - ]

Source for that claim?

lionkor 3 hours ago [ - ]

Nobody is releasing NEW models

aspenmartin 2 hours ago [ - ]

…not only is this not true but it also doesn’t matter. Why would this indicate performance saturating?

kstenerud 2 hours ago [ - ]

What constitutes a NEW model for the purposes of calculating progress?

GardenLetter27 2 hours ago [ - ]

What? DeepSeekV3 just came out and is incredible for the price. Mythos is also half-released.

taneq 3 hours ago [ - ]

The standard networking connection has been called “Ethernet” for more than thirty years, so networking has stagnated, right?

SlinkyOnStairs 2 hours ago [ - ]

If higher bandwidth networking consisted primarily running more and more ethernet lines in parallel, you would most certainly agree that "networking has stagnated".

"Reasoning" and now "Agentic" AI systems are not some fundamental improvement on LLMs, they're just running roughly the same prior-gen LLMS, multiple times.

Hence the conclusion that LLM improvement has slowed down, if not stagnated entirely, and that we should not expect the improvements of switching to these "reasoning" systems to keep happening.

p1esk an hour ago [ - ]

From TFA:

“ChatGPT came up with an idea which is original and clever. It is the sort of idea I would be very proud to come up with after a week or two of pondering, and it took ChatGPT less than an hour to find and prove”

SlinkyOnStairs an hour ago [ - ]

You misunderstand. I'm not saying that Reasoning/Agentic systems aren't better.

I'm saying they're not an advancement in the tech in the way GPT 1 through 3 were. They're a different kind of improvement.

And as such the rate improvement cannot just be extrapolated into the future.

p1esk an hour ago [ - ]

GPT1 through GPT3 advancement were exactly like using more Ethernet cables in parallel.

All interesting conceptual breakthroughs came after GPT3: RL and reasoning being the main ones.

Davidzheng 2 hours ago [ - ]

Deep think still makes many many many more mistakes than gpt 5.5 pro on math