There’s nothing much new about the architecture. The real gains come from the usage traces.

It turns out that having a text based interface for a text-trained model creates a very nice feedback loop.

Right now as we speak, people are generating text traces on anthropic and OpenAI servers that teach their models to do everything under the sun, text wise.

So people right now getting super mad at how dumb the model is when reverse-engineering a super complex function from binary, when they write “stop, you dumb robot, you are going wrong, go this way thank you very much” are actually leaving a lesson in the form of the "chat" text history.

Some may say that each bad word get us closer to ASI.

That and obviously the order of magnitude more efficient GPUS we got that allow for different tradeoffs at training time.

Makes me wonder, as people grow to trust the AI more and more, not reading the code and barely skimming the implementation plans and simply rerolling if something doesn't work, will the value of these chats erode? Thinking back 1-1.5 years I was closely monitoring what these agents did and steering them quite aggressively. These days not so much. Where will RL signals come from when it approaches humans capabilities ever closer? How well does self play work for coding work? What about multistep tasks where it isn't just about being good at a single task, but evolving a codebase over time in the face of changing requirements?

Over a large sample size, simply getting feedback of "Did this work for me, y/n" is valuable even if the specific details are missing and even if the overall tasks are complicated and multifaceted.

Not sure, but in my experience, instead of asking for code, i'm asking for solutions and providing a kubectl configured to reach my cluster and az monitor command to read the logs and telemetry.

A typical session is the agent establishing a metrics and log baseline, creating the code, compiling, deploying, observing, fixing, redeploying, observing metrics, determining the outcome and commiting.

I really, really, don't look at the code anymore.

UPDATE:

so my point is: it won't have my stewarding the code anymore, but it will have the infrastructure (and ultimately the real world) providing feedback on the traces.

The only reason I still read the output at my day job is because I still need to send it to another human for review, and I'd be embarrassed and ashamed if I let some slop through. For my hobby projects.. there are definitely parts I don't know how they work.

Maybe we need some form of long-term training. How long does the code that the AI wrote stick around before being rewritten.

I guess we can do this retroactively too if we could somehow tag AI-written lines of code in the VCS, then in a couple years we can check which parts lasted.

> There’s nothing much new about the architecture. The real gains come from the usage traces.

sorry. how do you know. i am so curious about where exactly gains are coming from but so hard to even get a little bit of insight.

i wish govt would fund these labs and make it free and opensource. way better investment than stupid overseas wars.

> i wish govt would fund these labs and make it free and opensource.

It would be impossible for the govt to allocate this much capital towards such a moonshot, and even if they could, they would do it in a way that would get 90% frittered away to fraud and waste

I have excellent news for you. Lux @ ORNL and Equinox @ Argonne are to be completed by EOY, with Solstice (100k NVIDIA chips, currently spec'd to be Vera Rubins) in the next five years.

https://www.whitehouse.gov/presidential-actions/2025/11/laun...

What makes you so sure? There's been massively successful government funded and run projects before. Soviets beat the Americans to space, after all.