Im old enough to remember the mystery and hype before o*/o1/strawberry that was supposed to be essentially AGI. We had serious news outlets write about senior people at OpenAI quitting because o1 was SkyNet
Now we're up to o4, AGI is still not even in near site (depending on your definition, I know). And OpenAI is up to about 5000 employees. I'd think even before AGI a new model would be able to cover for at least 4500 of those employees being fired, is that not the case?
Remember that Docusign has 7,000 employees. I think OpenAI is pretty lean for what they're accomplishing.
I don't think these comparisons are useful. Every time you look at companies like LinkedIn or Docusign, yeah - they have a lot of staff, but a significant proportion of this are functions like sales, customer support, and regulatory compliance across a bazillion different markets; along with all the internal tooling and processes you need to support that.
OpenAI is at a much earlier stage in their adventures and probably doesn't have that much baggage. Given their age and revenue streams, their headcount is quite substantial.
If we're making comparisons, its more like someone selling a $10,000 course on how to be a millionaire
Not directly from OpenAI - but people in the industry is advertising how these advanced models can replace employees, yet they keep on going on hiring tears (including OpenAI). Lets see the first company to stand behind their models, and replace 50% of their existing headcount with agents. That to me would be a sign these things are going to replace peoples jobs. Until I see that, if OpenAI can't figure out how to replace humans with models, then no one will
I mean could you imagine if todays announcement was - the chatgpt.com webdev team has been laid off, and all new features and fixes will be complete by Codex CLI + o4-mini. That means they believe in the product theyre advertising. Until they do something like that, theyll keep on trusting those human engineers and try selling other people on the dream
I'm also a skeptic on AI replacing many human jobs anytime soon. It's mostly going to assist, accelerate or amplify humans in completing work better or faster. That's the typical historical technology cycle where better tech makes work more efficient. Eventually that does allow the same work to be done with less people, like a better IP telephony system enabling a 90 person call center to handle the same call volume that previously required 100 people. But designing, manufacturing, selling, installing and supporting the new IP phone system also creates at least 10 new jobs.
So far the only significant human replacement I'm seeing AI enable is in low-end, entry level work. For example, fulfilling "gig work" for Fiverr like spending an hour or two whipping up a relatively low-quality graphic logo or other basic design work for $20. This is largely done at home by entry-level graphic design students in second-world locales like the Philippines or rural India. A good graphical AI can (and is) taking some of this work from the humans doing it. Although it's not even a big impact yet, primarily because for non-technical customers, the Fiverr workflow can still be easier or more comfortable than figuring out which AI tool to use and how to get what they really want from it.
The point is that this Fiverr piece-meal gig work is the lowest paying, least desirable work in graphic design. No one doing it wants to still be doing it a year or two from now. It's the Mcdonald's counter of their industry. They all aspire to higher skill, higher paying design jobs. They're only doing Fiverr gig work because they don't yet have a degree, enough resume credits or decent portfolio examples. Much like steam-powered bulldozers and pile drivers displaced pick axe swinging humans digging railroad tunnels in the 1800s, the new technology is displacing some of the least-desirable, lowest-paying jobs first. I don't yet see any clear reason this well-established 200+ year trend will be fundamentally different this time. And history is littered with those who predicted "but this time it'll be different."
I've read the scenarios which predict that AI will eventually be able to fundamentally and repeatedly self-improve autonomously, at scale and without limit. I do think AI will continue to improve but, like many others, I find the "self-improve" step to be a huge and unevidenced leap of faith. So, I don't think it's likely, for reasons I won't enumerate here because domain experts far smarter than I am have already written extensively about them.
Not really. It could also mean their company's effective headcount is much greater than its nominal one.
Yes and Amazon has 1.52 million employees. How many developers could they possibly need?
Or maybe it’s just nonsensical to compare the number of employees across companies - especially when they don’t do nearly the same thing.
On a related note, wait until you find out how many more employees that Apple has than Google since Apple has hundreds of retail employees.
Apple has fewer employees than Google (164k < 183k).
Siri must be really good.
what kind of employees does Docusign employ? surely Digital Documents dont require physical onsite distribution centers and labor
Just look at their careers page
Its a lot of sales/account managers. and some engineers
wow the sales go hard in this product
[flagged]
The US is not a signatory to the International Criminal Court so you won't see Musk on trial there.
I hope I don't have to link this adjacent reply of mine too many more times: https://news.ycombinator.com/item?id=43709056 Specifically "The venue is a matter of convenience, nothing more," and if you prefer another, that would work about as well. Perhaps Merano; I hear it's a lovely little town.
The closest Elon ever came to anything Hague-worthy is allowing Starlink to be used in Ukrainian attacks on Russian civilian infrastructure. I don't think the Hague would be interested in anything like that. And if his life is worthless, then what would you say about your own? Nonetheless, I commend you on your complete lack of hinges. /s
Oh, I'm thinking more in the sense of the special one-off kinds of trials, the sort Gustave Gilbert so ably observed. The venue is a matter of convenience, nothing more. To the rest I would say the worth of my life is no more mine to judge than anyone else is competent to do the same for themselves, or indeed other than foolish to pursue the attempt.
True.
Deep learning models will continue to improve as we feed them more data and use more compute, but they will still fail at even very simple tasks as long as the input data are outside their training distribution. The numerous examples of ChatGPT (even the latest, most powerful versions) failing at basic questions or tasks illustrate this well. Learning from data is not enough; there is a need for the kind of system-two thinking we humans develop as we grow. It is difficult to see how deep learning and backpropagation alone will help us model that. https://medium.com/thoughts-on-machine-learning/why-sam-altm...
> Im old enough to remember the mystery and hype before o*/o1/strawberry
So at least two years old?
Honestly, sometimes I wonder if most people these days kinda aren't at least that age, you know? Or less inhibited about acting it than I believe I recall people being last decade. Even compared to just a few years back, people seem more often to struggle to carry a thought, and resort much more quickly to emotional belligerence.
Oh, not that I haven't been as knocked about in the interim, of course. I'm not really claiming I'm better, and these are frightening times; I hope I'm neither projecting nor judging too harshly. But even trying to discount for the possibility, there still seems something new left to explain.
> Even compared to just a few years back, people seem more often to struggle to carry a thought, and resort much more quickly to emotional belligerence.
We're living in extremely uncertain times, with multiple global crises taking place at the same time, each of which could develop into a turning point for humankind.
At the same time, predatory algorithms do whatever it takes to make people addicted to media, while mental health care remains inaccessible for many.
I feel like throwing a tantrum almost every single day.
I feel perhaps I've been unkind to many people in my thoughts, but I'm conflicted. I don't understand myself to be particularly fearless, but what times call more for courage than times like these? How do people afraid even to try to practice courage expect to find it, when there isn't time for practice any more?
You have only so many spoons available per crisis. Even picking your battle can become a problem.
I've been out in the streets, protesting and raising awareness of climate change. I no longer do. It's a pointless waste of time. Today, the climate change deniers are in charge.
I don't assume I'm going to be given the luxury of picking my battles, and - though I've been aware of "spoon theory" since I watched it getting invented at Shakesville back in the day - I've never held to it all that strongly, even as I acknowledge I've also never been quite the same since a nasty bout of wild-type covid in early 2020. Now as before, I do what needs doing as best I can, then count the cost. Some day that will surely prove too high, and my forward planning efforts will be put to the test. Till then I'm happy not to borrow trouble.
I've lived in this neighborhood a long time, and there are a couple of old folks' homes a block or so from here. Both have excellent views, on one frontage each, of an extremely historic cemetery, which I have always found a wonderfully piquant example of my adopted hometown's occasionally wire-brush sense of humor. But I bring it up to mention that the old folks don't seem to have much concern for spoons other than to eat with, and they are protesting the present situation regularly and at considerable volume, and every time I pass about my errands I make a point of raising a fist and hollering "hell yeah!" just like most of the people who drive past honk in support.
Will you tell them it's pointless?
I think people expected reasoning to be more than just trained chain of thought (which was known already at the time). On the other hand, it is impressive that CoT can achieve so much.
Yeah, I don't know exactly what at an AGI model will look like, but I think it would have more than 200k context window.
Do you have a 200k context window? I don't. Most humans can only keep 6 or 7 things in short term memory. Beyond those 6 or 7 you are pulling data from your latent space, or replacing of the short term slots with new content.
But context windows for LLMs include all the “long term memory” things you’re excluding from humans
Long term memory in an LLM is its weights.
Not really, because humans can form long term memories from conversations, but LLM users aren’t finetuning models after every chat so the model remembers.
He's right, but most people don't have the resources, nor indeed the weights themselves, to keep training the models. But the weights are very much long term memory.
users aren’t finetuning models after every chat
Users can do that if they want, but it’s more effective and more efficient to do that after every billion chats, and I’m sure OpenAI does it.
If you want the entire model to remember everything it talked about with every user, sure. But ideally, I would want the model to remember what I told it a few million tokens ago, but not what you told it (because to me, the model should look like my private copy that only talks to me).
ideally, I would want the model to remember what I told it a few million tokens ago
Yes, you can keep finetuning your model on every chat you have with it. You can definitely make it remember everything you have ever said. LLMs are excellent at remembering their training data.
I'm not quite AGI, but I work quite adequately with a much, much smaller memory. Maybe AGI just needs to know how to use other computers and work with storage a bit better.
I'd think it would be able to at least suggest which model to use rather than just having 6 for you to choose from.
I’m not an AI researcher but I’m not convinced these contemporary artificial neural networks will get us to AGI, even assuming an acceleration to current scaling pace. Maybe my definition of AGI is off but I’m thinking what that means is a machine that can think, learn and behave in the world in ways very close to human. I think we need a fundamentally different paradigm for that. Not something that is just trained and deployed like current models, but something that is constantly observing, constantly learning and constantly interacting with the real world like we do. AHI, not AGI. True AGI may not exist because there are always compromises of some kind.
But, we don’t need AGI/AHI to transform large parts of our civilization. And I’m not seeing this happen either.
I feel like every time AI gets better we shift the goalposts of AGI to something else.
I don't think we shift the goalposts for AGI. I'm not getting the sense that people are redefining what AGI is when a new model is released. I'm getting the sense that some people are thinking like me when a new model is released: we got a better transformer, and a more useful model trained on more or better data, but we didn't get closer to AGI. And people are saying this not because they've pushed out what AGI really means, they're saying this because the models still have the same basic use cases, the same flaws and the same limitations. They're just better at what they already do. Also, the better these models get at what they already do, the more starkly they contrast with human capabilities, for better or worse.
> Now we're up to o4, AGI is still not even in near site (depending on your definition, I know)
It's not only definition. Some googler was sure their model was conscious.
Meanwhile even the highest ranked models can’t do simple logic tasks. GothamChess on YouTube did some tests where he played against a bunch of the best models and every single one of them failed spectacularly.
They’d happily lose a queen to take a pawn. They failed to understand how pieces are even allowed to move, hallucinated the existence of new pieces, repeatedly declared checkmate when it wasn’t, etc.
I tried it last night with Gemini 2.5 Pro and it made it 6 turns before it started making illegal moves, and 8 turns before it got so confused about the state of the board before it refused to play with me any longer.
I was in the chess club in 3rd grade. One of the top ranked LLMs in the world is vastly dumber than I was in 3rd grade. But we’re going to pour hundreds of billions into this in the hope that it can end my career? Good luck with that, guys.
Chess is not exactly a simple logic task. It requires you to keep track of 32 things in a 2d space.
I remember being extremely surprised when I could ask GPT3 to rotate a 3d model of a car in it's head and ask it about what I would see when sitting inside, or which doors would refuse to open because they're in contact with the ground.
It really depends on how much you want to shift the goalposts on what constitutes "simple".
> Chess is not exactly a simple logic task.
Compare to what a software engineer is able to do, it is very much a simple logic task. Or the average person having a non-trivial job. Or a beehive organizing its existence, from its amino acids up to hive organization. All those things are magnitudes harder than chess.
> I remember being extremely surprised when I could ask GPT3 to rotate a 3d model of a car in it's head and ask it about what I would see when sitting inside, or which doors would refuse to open because they're in contact with the ground.
It's not reasoning its way there. Somebody asked something similar some time in the corpus and that corpus also contained the answers. That's why it can answer. After a quite small number of moves, the chess board it unique and you can't fake it. You need to think ahead. A task which computers are traditionally very good at. Even trained chess players are. That LLMs are not goes to show that they are very far from AGI.
I'm not sure why people are expecting a language model to be great at chess. Remember they are trained on text, which is not the best medium for representing things like a chess board. They are also "general models", with limited training on pretty much everything apart from human language.
An Alpha Star type model would wipe the floor at chess.
This misses the point. LLMs will do things like move a knight by a single square as if it were a pawn. Chess is an extremely well understood game, and the rules about how things move is almost certainly well-represented in the training data.
These models cannot even make legal chess moves. That’s incredibly basic logic, and it shows how LLMs are still completely incapable of reasoning or understanding. Many kinds of task are never going to be possible for LLMs unless that changes. Programming is one of those tasks.
>These models cannot even make legal chess moves. That’s incredibly basic logic, and it shows how LLMs are still completely incapable of reasoning or understanding.
Yeah they can. There's a link I shared to prove it which you've conveniently ignored.
LLMs learn by predicting, failing and getting a little better, rinse and repeat. Pre-training is not like reading a book. LLMs trained on chess games play chess just fine. They don't make the silly mistakes you're talking about and they very rarely make illegal moves.
There's gpt-3.5-turbo-instruct which i already shared and plays at around 1800 ELO. Then there's this grandmaster level chess transformer - https://arxiv.org/abs/2402.04494. They're also a couple of models that were trained in the Eleuther AI discord that reached about 1100-1300 Elo.
I don't know what the peak of LLM Chess playing looks like but this is clearly less of a 'LLMs can't do this' problem and more 'Open AI/Anthropic/Google etc don't care if their models can play Chess or not' problem.
So are they capable of reasoning now or would you like to shift the posts ?
I think the point here is that if you have to pretrain it for every specific task, it's not artificial general intelligence, by definition.
There isn't any general intelligence that isn't receiving pre-traning. People spend 14 to 18+ years in school to have any sort of career.
You don't have to pretrain it for every little thing but it should come as no surprise that a complex non-trivial game would require it.
Even if you explained all the rules of chess clearly to someone brand new to it, it will be a while and lots of practice before they internalize it.
And like I said, LLM pre-training is less like a machine reading text and more like Evolution. If you gave a corpus of chess rules, you're only training a model that knows how to converse about chess rules.
Do humans require less 'pre-training' ? Sure, but then again, that's on the back of millions of years of evolution. Modern NNs initialize random weights and have relatively very little inductive bias.
People are focussing on chess, which is complicated, but LLM fail at even simple games like tic-tac-toe where you'd think, if it was capable of "reasoning" it would be able to understand where it went wrong. That doesn't seem to be the case.
What it can do is write and execute code to generate the correct output, but isn't that cheating?
Which SOTA LLM fails at tic-tac-toe?
Saying programming is a task that is "never going to be possible" for an LLM is a big claim, given how many people have derived huge value from having LLMs write code for them over the past two years.
(Unless you're arguing against the idea that LLMs are making programmers obsolete, in which case I fully agree with you.)
I think "useful as an assistant for coding" and "being able to program" are two different things.
When I was trying to understand what is happening with hallucination GPT gave me this: > It's called hallucinating when LLMs get things wrong because the model generates content that sounds plausible but is factually incorrect or made-up—similar to how a person might "see" or "experience" things that aren't real during a hallucination.
From that we can see that they fundamentally don't know what is correct. While they can get better at predicting correct answers, no-one has explained how they are expected to cross the boundary from "sounding plausible" to "knowing they are factually correct". All the attempts so far seem to be about reducing the likelihood of hallucination, not fixing the problem that they fundamentally don't understand what they are saying.
Until/unless they are able to understand the output enough to verify the truth then there's a knowledge gap that seems dangerous given how much code we are allowing "AI" to write.
Code is one of the few applications of LLMs where they DO have a mechanism for verifying if what they produced is correct: they can write code, run that code, look at the output and iterate in a loop until it does what it's supposed to do.
> I'm not sure why people are expecting a language model to be great at chess.
Because the conversation is about AGI, and how far away we are from AGI.
Does AGI mean good at chess?
What if it is a dumb AGI?
Claude can't beat Pokemon Red. Not even close yet: https://arstechnica.com/ai/2025/03/why-anthropics-claude-sti...
LLMs can play chess fine.
The best model you can play with is decent for a human - https://github.com/adamkarvonen/chess_gpt_eval
SOTA models can't play it because these companies don't really care about it.
> We had serious news outlets write about senior people at OpenAI quitting because o1 was SkyNet
I wonder if any of the people that quit regret doing so.
Seems a lot like Chicken Little behavior - "Oh no, the sky is falling!"
How anyone with technical acumen thinks current AI models are conscious, let alone capable of writing new features and expanding their abilities is beyond me. Might as well be afraid of calculators revolting and taking over the world.