People should be worried because right now AI is on an exponential growth trajectory and no-one knows when it will level off into an s-curve. AI is starting to get close to good enough. If it becomes twice as good in seven months then what?

What's the basis for your claim that it is on an exponential growth trajectory? That's not the way it feels to me as a fairly heavy user, it feels more like an asymptotic approach to expert human level performance where each new model gets a bit closer but is not yet reaching it, at least in areas where I am expert enough to judge. Improvements since the original ChatGPT don't feel exponential to me.

This also tracks with my experience. Of course, technical progress never looks smooth through the steep part of the s-curve, more a sequence of jagged stair-steps (each their own little s-curve in miniature). We might only be at the top of a stair. But my feeling is that we're exhausting the form-factor of LLMs. If something new and impressive comes along it'll be shaped different and fill a different niche.

People don't consider that there are real physical/thermodynamic constraints on intelligence. It's easy to imagine some skynet scenario, but all evidence suggests that it takes significant increases in energy consumption to increase intelligence.

Even in nature this is clear. Humans are a great example: cooked food predates homo sapiens and it is largely considered to be a pre-requisite for having human level intelligence because of the enormous energy demands of our brains. And nature has given us wildly more efficient brains in almost every possible way. The human brain runs on about 20 watts of power, my RTX uses 450 watts at full capacity.

The idea of "runaway" super intelligence has baked in some very extreme assumptions about the nature of thermodynamics and intelligence, that are largely just hand waved away.

On top of that, AI hasn't changed in a notable way for me personally in a year. The difference between 2022 and 2023 was wild, between 2023 and 2024 changed some of my workflows, 2024 to today largely is just more options around which tooling I used and how these tools can be combined, but nothing really at a fundamental level feels improved for me.

I was worried about that a couple of years ago, when there was a lot of hope that deeper reasoning skills and hallucination avoidance would simply arrive as emergent properties of a large enough model.

More recently, it seems like that's not the case. Larger models sometimes even hallucinate more [0]. I think the entire sector is suffering from a Dunning Kruger effect -- making an LLM is difficult, and they managed to get something incredible working in a much shorter timeframe than anyone really expected back in the early 2010s. But that led to overconfidence and hype, and I think there will be a much longer tail in terms of future improvements than the industry would like to admit.

Even the more advanced reasoning models will struggle to play a valid game of chess, much less win one, despite having plenty of chess games in their training data [1]. I think that, combined with the trouble of hallucinations, hints at where the limitations of the technology really are.

Hopefully LLMs will scare society into planning how to handle mass automation of thinking and logic, before a more powerful technology that can really do it arrives.

[0]: https://techcrunch.com/2025/04/18/openais-new-reasoning-ai-m...

[1]: https://dev.to/maximsaplin/can-llms-play-chess-ive-tested-13...

really? I find newer models hallucinate less, and I think they have room for improvement, with better training.

I believe hallucinations are partly an artifact of imperfect model training, and thus can be ameliorated with better technique.

Yes, really!

Smaller models may hallucinate less: https://www.intel.com/content/www/us/en/developer/articles/t...

The RAG technique uses a smaller model and an external knowledge base that's queried based on the prompt. The technique allows small models to outperform far larger ones in terms of hallucinations, at the cost of performance. That is, to eliminate hallucinations, we should alter how the model works, not increase its scale: https://highlearningrate.substack.com/p/solving-hallucinatio....

Pruned models, with fewer parameters, generally have a lower hallucination risk: https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00695.... "Our analysis suggests that pruned models tend to generate summaries that have a greater lexical overlap with the source document, offering a possible explanation for the lower hallucination risk."

At the same time, all of this should be contrasted with the "Bitter Lesson" (https://www.cs.utexas.edu/~eunsol/courses/data/bitter_lesson...). IMO, making a larger LLMs does indeed produce a generally superior LLM. It produces more trained responses to a wider set of inputs. However, it does not change that it's an LLM, so fundamental traits of LLMs - like hallucinations - remain.

Let's look:

GPT-1 June 2018

GPT-2 February 2019

GPT-3 November 2021

GPT-4 March 2023

Claude tells me this is the rough improvement of each:

GPT-1 to 2: 5-10x

GPT-2 to 3: 10-20x

GPT 3 to 4: 2-4x

Now it's been 2.5 years since 4.

Are you expecting 5 to be 2-4x better, or 10-20x better?

How are you measuring this improvement factor? We have numerous benchmarks for LLMs and they are all saturating. We are rapidly approaching AGI by that measure, and headed towards ASI. They still won't be "human" but they will be able to do everything humans can, and more.

[deleted]
[deleted]