Hacker News

There's speculation that next Tuesday will be a big day for OpenAI and possibly GPT 6. Anthropic showed their hand today.

varispeed a day ago [ - ]

Sounds like a good opportunity to pause spending on nerfed 4.6 and wait for the new model to be released and then max out over 2 weeks before it gets nerfed again.

SparkyMcUnicorn a day ago [ - ]

https://marginlab.ai/trackers/claude-code-historical-perform...

codezero a day ago [ - ]

the performance degradation I've seen isn't quality/completion but duration, I get good results but much less quickly than I did before 4.6. Still, it's just anecdata, but a lot of folks seem to feel the same.

refulgentis 21 hours ago [ - ]

Been reading posts like these for 3 years now. There’s multiple sites with #s. I’m willing to buy “I’m paying rent on someone’s agent harness and god knows what’s in the system prompt rn”, but in the face of numbers, gotta discount the anecdotal.

codezero 4 hours ago [ - ]

You're probably right. It's probably more likely that for some period of time I forgot that I switched to the large context Opus vs Sonnet and it was not needed for the level of complexity of my work.

coldtea 12 hours ago [ - ]

Yeah, why trust your actual experience over numbers? Nothing surer than synthetic benchmarks

refulgentis 11 hours ago [ - ]

Strawman, and, synthetic benchmark? :)

andai 12 hours ago [ - ]

This just looks like random noise to me? Is it also random on short timespans, like running it 10x in a row?

dns_snek 9 hours ago [ - ]

I don't believe that trackers like this are trustworthy. There's an enormous financial motive to cheat and these companies have a track record of unethical conduct.

If I was VP of Unethical Business Strategy at OpenAI or Anthropic, the first thing I'd do is put in place an automated system which flags accounts, prompts, IPs, and usage patterns associated with these benchmarks and direct their usage to a dedicated compute pool which wouldn't be affected by these changes.

enraged_camel a day ago [ - ]

That does not sound very believable. Last time Anthropic released a flagship model, it was followed by GPT Codex literally that afternoon.

cyanydeez a day ago [ - ]

Ya'll know they're teaching to the test. I'll wait till someone devises a novel test that isn't contained in the datasets. Sure, they're still powerful.

swalsh a day ago [ - ]

My understanding is GPT 6 works via synaptic space reasoning... which I find terrifying. I hope if true, OpenAI does some safety testing on that, beyond what they normally do.

tyre a day ago [ - ]

From the recent New Yorker piece on Sam:

“My vibes don’t match a lot of the traditional A.I.-safety stuff,” Altman said. He insisted that he continued to prioritize these matters, but when pressed for specifics he was vague: “We still will run safety projects, or at least safety-adjacent projects.” When we asked to interview researchers at the company who were working on existential safety—the kinds of issues that could mean, as Altman once put it, “lights-out for all of us”—an OpenAI representative seemed confused. “What do you mean by ‘existential safety’?” he replied. “That’s not, like, a thing.”

HDThoreaun 19 hours ago [ - ]

No chance an openAI spokesperson doesnt know what existential safety is

Barbing 18 hours ago [ - ]

I did not read the response as...

>Please provide the definition of Existential Safety.

I read:

>Are you mentally stable? Our product would never hurt humanity--how could any language model?

stratos123 12 hours ago [ - ]

The absolute gall of this guy to laugh off a question about x-risks. Meanwhile, also Sam Altman, in 2015: "Development of superhuman machine intelligence is probably the greatest threat to the continued existence of humanity. There are other threats that I think are more certain to happen (for example, an engineered virus with a long incubation period and a high mortality rate) but are unlikely to destroy every human in the universe in the way that SMI could. Also, most of these other big threats are already widely feared." [1]

[1] https://blog.samaltman.com/machine-intelligence-part-1

t0lo 9 hours ago [ - ]

Why are these people always like this.

actionfromafar a day ago [ - ]

Amusing! Even if they believe that, they should know the company communicated the opposite earlier.

coppsilgold a day ago [ - ]

Likely an improvement on:

> We study a novel language model architecture that is capable of scaling test-time computation by implicitly reasoning in latent space. Our model works by iterating a recurrent block, thereby unrolling to arbitrary depth at test-time. This stands in contrast to mainstream reasoning models that scale up compute by producing more tokens. Unlike approaches based on chain-of-thought, our approach does not require any specialized training data, can work with small context windows, and can capture types of reasoning that are not easily represented in words. We scale a proof-of-concept model to 3.5 billion parameters and 800 billion tokens. We show that the resulting model can improve its performance on reasoning benchmarks, sometimes dramatically, up to a computation load equivalent to 50 billion parameters.

<https://arxiv.org/abs/2502.05171>

levocardia a day ago [ - ]

Oh you mean literally the thing in AI2027 that gets everyone killed? Wonderful.

Turn_Trout 20 hours ago [ - ]

AI 2027 is not a real thing which happened. At best, it is informed speculation.

mgambati 20 hours ago [ - ]

Funny if you open their website and go to April 2026 you literally see this: 26b revenue (Anthropic beat 30b) + pro human hacking (mythos?).

I don’t think predictions, but they did a great call until now.

Turn_Trout 6 hours ago [ - ]

I agree that they called many things remarkably well! That doesn't change the fact that AI 2027 is not a thing which happened, so it isn't valid to point out "this killed us in AI 2027." There are many reasons to want to preserve CoT monitorability. Instead of AI 2027, I'd point to https://arxiv.org/html/2507.11473.

16 hours ago [ - ]

[deleted]

6 hours ago [ - ]

[deleted]

notrealyme123 a day ago [ - ]

That's sounds really interesting. Do you have some hints where to read more?

arm32 a day ago [ - ]

Oh, of course they will /s