Am I the only one who wasn’t particularly impressed by AutoResearch? If you looked at what the agent was actually doing, it was just tuning parameters mostly, not really trying different novel approaches.

I couldn’t help myself but consider this mostly a very inefficient variant of hyperparameter optimization, but someone correct me if I’m wrong, I may be looking at this too pessimistic.

I didn't dig into what the actual repository was doing, but personally, I took some inspiration from the idea after reading about it and realizing that I might have been underestimating the ability of LLMs. I put a bit more work into a performance harness I was using locally and just set some agents to brainstorming and they did seem to find some great stuff. So I don't really have a stance one way or another on this specific repo, but the general idea seems like a really good one.

Ever since AlphaEvolve - the idea that if you build a harness which can evaluate solutions and give LLMs a database where they can keep storing their work and then sample from it - they do find non-trivial solutions over time leaning from their own past ideas.

It is the ultimate manifestation of test-time scaling. I think karpathy just popularised it.

Karpathy embedded within an organization is way more impressive than him out on his own with hot takes and little projects. I hope he does great things for Anthropic.

Absolutely, I wasn’t saying that him being at Anthropic wasn’t going to be effective, I just think his little projects wouldn’t be very interesting if his name wasn’t attached to them.

    > Am I the only one who wasn’t particularly impressed by AutoResearch?
isn't it just a nerfed AlphaEvolve? https://arxiv.org/abs/2506.13131

Inefficient variants with $100m+ worth of compute will still probably outperform the best team of researchers