Something that caught my eye from the announcement:

> GPT‑5.3‑Codex is our first model that was instrumental in creating itself. The Codex team used early versions to debug its own training

I'm happy to see the Codex team moving to this kind of dogfooding. I think this was critical for Claude Code to achieve its momentum.

Sounds like the researchers behind https://ai-2027.com/ haven't been too far off so far.

We'll see. The first two things that they said would move from "emerging tech" to "currently exists" by April 2026 are:

- "Someone you know has an AI boyfriend"

- "Generalist agent AIs that can function as a personal secretary"

I'd be curious how many people know someone that is sincerely in a relationship with an AI.

And also I'd love to know anyone that has honestly replaced their human assistant / secretary with an AI agent. I have an assistant, they're much more valuable beyond rote input-output tasks... Also I encourage my assistant to use LLMs when they can be useful like for supplementing research tasks.

Fundamentally though, I just don't think any AI agents I've seen can legitimately function as a personal secretary.

Also they said by April 2026:

> 22,000 Reliable Agent copies thinking at 13x human speed

And when moving from "Dec 2025" to "Apr 2026" they switch "Unreliable Agent" to "Reliable Agent". So again, we'll see. I'm very doubtful given the whole OpenClaw mess. Nothing about that says "two months away from reliable".

> Someone you know has an AI boyfriend

MyBoyfriendIsAI is a thing

> Generalist agent AIs that can function as a personal secretary

Isn't that what MoltBot/OpenClaw is all about?

So far these look like successful predictions.

Moltbot is an attempt to do that. Would you hire it as a personal secretary and entrust all your personal data to it?

Only people who haven't had a secretary would think it's a personal secretary.

Like, it can't even answer the phone.

There are plenty of companies that sell an AI assistant that answers the phone as a service, they just aren't named OpenAI or Anthropic. They'll let callers book an appointment onto your calendar, even!

No, there are companies that sell voice activated phone trees, but no one is getting results out of unstructured, arbitrary phone call answering with actions taken by an LLM.

I'm sure there are research demos in big companies, I'm sure some AI bro has done this with the Twilio API, but no one is seriously doing this.

All it takes is one "can you take this to the post office", the simplest, of requests, and you're in a dead end of at best refusal, but more likely role-play.

Agreed that “unstructured arbitrary phone calls + arbitrary actions” is where things go to die.

What does work in production (at least for SMB/customer-support style calls) is making the problem less magical: 1) narrow domain + explicit capabilities (book/reschedule/cancel, take a message, basic FAQs) 2) strict tool whitelist + typed schemas + confirmations for side effects 3) robust out-of-scope detection + graceful handoff (“I can’t do that, but I can X/Y/Z”) 4) real logs + eval/test harnesses so regressions get caught

Once you do that, you can get genuinely useful outcomes without the role-play traps you’re describing.

We’ve been building this at eboo.ai (voice agents for businesses). If you’re curious, happy to share the guardrails/eval setup we’ve found most effective.

https://www.instagram.com/p/DMfpj0hM7e0/

is obviously a staged demo but it seems pretty serious for him. He's wearing a suit and everything!

https://www.instagram.com/p/DK8fmYzpE1E/

seems like research by some dude (no disrespect, he doesn't seems like he's at big company though).

https://www.instagram.com/p/DH6EaACR5-f/

could be astroturf, but seems maybe a little bit serious.

[deleted]

It's important to remember though (this is besides the point for what you're saying) that job displacement of things like secretaries from AI do not require it to be a near perfect replacement. There are many other factors (for example if it's much cheaper and can do part of the work it can dramatically shrink demand as people can shift to an imperfect replacement in AI)

I think they immediately corrected their median timelines for takeoff to 2028 upon releasing the article (I believe there was a math mistake or something initially), so all those dates can probably be bumped back a few months. Regardless, the trend seems fairly on track.

People have been in love with machines for a long time. It's just that the machines didn't talk back so we didn't grant them the "partner" status. Wait for car+LLM and you'll have a killer combo.

KITT, is that you?

I don't think generative AI is even close to making model development 50% faster

Only on HN will people still doubt what is happening right in front of their eyes. I understand that putting things into perspective is important, still, the type of downplaying we can see in the comments here is not only funny but also has a dangerous dimension to it. Ironically, these are the exact same people who will claim "we should have prepared better!" once the effects become more and more visible. Dear super engineers, while I feel sorry that your job and passion become a commodity right in front of you, please stay out the way.

Is gpt5.3 200x bigger than gpt4? Looks like openai used this fanfiction as its marketing strategy

> researchers

that's certainly one way to refer to Scott Alexander

Scott Alexander essentially provided editing and promotion for AI 2027 (and did a great job of it, I might add). Are you unaware of the actual researchers behind the forecasting/modelling work behind it, and you thought it was actually all done by a blogger? Or are you just being dismissive for fun?

tbh mostly dismissive of Scott Alexander for fun, couldn't quite help myself

More importantly, this is the early steps of a model self improving itself.

Do we still think we'll have soft take off?

> Do we still think we'll have soft take off?

There's still no evidence we'll have any take off. At least in the "Foom!" sense of LLMs independently improving themselves iteratively to substantial new levels being reliably sustained over many generations.

To be clear, I think LLMs are valuable and will continue to significantly improve. But self-sustaining runaway positive feedback loops delivering exponential improvements resulting in leaps of tangible, real-world utility is a substantially different hypothesis. All the impressive and rapid achievements in LLMs to date can still be true while major elements required for Foom-ish exponential take-off are still missing.

Yes, but also you'll never have any early evidence of the Foom until the Foom itself happens.

If only General Relativity had such an ironclad defense of being as unfalsifiable as Foom Hypothesis is. We could’ve avoided all of the quantum physics nonsense.

it doesn't mean it's unfalsifiable - it's a prediction about the future so you can falsify it when there's a bound on when it is going to happen. it just means there's little to no warning. I think it's a significant risk to AI progress that it can reach some sort of improvement speed > speed of warning or any threats from AI improvement

To me FOOM means like the hardest of hard takeoffs and improving at a sustained rate which is higher than without humans is not a takeoff at all.

This has already been going on for years. It's just that they were using GPT 4.5 to work on GPT 5. All this announcement mean is that they're confident enough in early GPT 5.3 model output to further refine GPT 5.3 based on initial 5.3. But yes, takeoff will still happen because of this recursive self improvement works, it's just that we're already past the inception point.

I can't tell if this is a serious conversation anymore.

I think it's important in AI discussions to reason correctly from fundamentals and not disregard possibilities simply because they seem like fiction/absurd. If the reasoning is sound, it could well happen.

“Best start believing in science fiction stories. You're in one.”

https://x.com/TheZvi/status/2017310187309113781

I totally got what you felt there. We are truly living in a sci-fi world

I guess humans were involved in all that, so how is that anything but tool use?

I think the limiting factor is capital, not code. And I doubt GPTX is anymore competent at raising funds than the other, fleshy, snake oilers...

Exponential growth may look like a very slow increase at first, but it's still exponential growth.

Sigmoids may look like exponential growth at first, until they saturate. Early growth alone cannot distinguish between them.

Intelligence must be sigmoid of course, but it may not saturate until well past human intelligence.

Intelligence might be more like an optimization problem, fitting inputs to optimal outputs. Sometimes reality is simply too chaotic to model precisely so there is a limit to how good that optimization can be.

It would be like distance to the top of a mountain. Even if someone is 10x closer, they could still only be within arms reach.

On the other hand: Perception of change might not be linear but logarithmic.

(= it might take an order of magnitude of improvements to be perceived as a substantial upgrade)

So the perceived rate of change might be linear.

It's definitely true for some things such as wealth:

- $2000 is a lot of you have $1000.

- It's a substantial improvement of you have $10000.

- It's not a lot you have $1m

- It does not matter if you have $1b

$2000 is not substantial over $1b on the linear scale

2k is the same on the linear scale no matter where you are. that's what the linear scale is about.

you're already interpreting this on the log scale

If it's exponential growth. It may just as well be some slow growth and continue to be so.

I'm only saying no to keep optimistic tbh

It feels crazy to just say we might see a fundamental shift in 5 years.

But the current addition to compute and research etc. def goes in this direction I think.

making the specifications is still hard, and checking how well results match against specifications is still hard.

i dont think the model will figure that out on its own, because the human in the loop is the verification method for saying if its doing better or not, and more importantly, defining better