> There's so much work in delivering products that will carry your brand, and then must be supported.
People think otherwise with AI partly because Anthropic kept telling us that they didn't have to write code or review code any more for most of their work. Their agent swarms just comb through their github, slack and wikis to figure out what to do next, and another swarm of agents just review, test, merge, deploy, A/B test, and revert the code. Boris alone merged nearly 300 PRs in the past week (or two?). So the top research labs seem have broken the productivity seal.
And then they talk about this recursively self-improving AI that is so powerful, so autonomous that they advocate that every company should be prepared to "pause" the effort. And their Fable/Mythos has this specific restriction as mentioned in their model card[1] that they are going to reject requests to tune and train models because, well you guess it, the models are too powerful to be used by mere mortals.
[1] We’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design). Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms. Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT).
I think taking Anthropic or any company in this space at face value is naive at best though. AGI has been 6 months away for years now. Surely anyone can think this through: Anthropic knows what theyre doing with their public facing repositories, they know to make things enabled by their tech seem impressive. I would consider Bun etc. examples of this.
Realistically, nobody intellectually honest really knows.
People think otherwise with AI partly because Anthropic kept telling us that they didn't have to write code or review code any more for most of their work.
Even if that were 100% true, it only collapses the coding effort to near zero. Anyone who's built and shipped a real product should know that coding is maybe 50% of the work, and on a mature product it can be much less.
even boris says they need people with judgment to manage the agents
i dont write code by hand anymore but shipping something people want is as hard (or maybe harder?) as its ever been
Boris also says he stops using /plan, he writes loop to write prompt, and he simply asks AI to come up with solutions. He also said many times that his agents would comb their emails, slack channels, and Github issues to come up with things to do. When we combine what he has said, it's hard not to have the impression that he was implying full autonomy of their agents. The only that the engineers need to do is to build harness and to issue approvals, rejections, or suggestions.
I work on a toy project that has exactly one user (me). On its face it's fairly simple. It's a portal to my media server because I didn't like how Plex worked with regards to searching and filtering. I can look for movies or series by director, studio, publisher, etc. I can rate things, I can find highly rated things. It's great, and instead of bugging plex support to add new features, I just tell Deepseek to do it. I started it before LLms were prevalent and now that I have open code I've had Deepseek write and rewrite most of my code and implement new features.
But even with this toy project, and the target market being someone I should know very well (me), I often struggle to figure out what I want the app to do. When I go through brainstorming or grilling sessions it'll often ask me a question about how the product ought to work and I'm just like ¯\_(ツ)_/¯ give me suggestions and I'll let you know.
Genuine creativity is something LLMs struggle with and it kind of makes sense given their design. If you have a complete plan for a feature or even just an idea what the feature should do, that is enough for an LLM. But asking it to think and come up with a new feature idea by itself will always yield mostly extereme basic things you've already thought of. That creativity of "what" to build so it serves a purpose is still very difficult imo and LLMs are not good at it.
> People think otherwise with AI partly because Anthropic kept telling us that they didn't have to write code or review code any more for most of their work. Their agent swarms just comb through their github, slack and wikis to figure out what to do next, and another swarm of agents just review, test, merge, deploy, A/B test, and revert the code. Boris alone merged nearly 300 PRs in the past week (or two?).
Apart from many other issues with this, heavily subsidized subscription plans won't last forever, and if you start burning your own money on tokens in this way, you'll soon realize it's terribly inefficient.
I’ve been wondering if “you’re not google” when learning about googles software dev process applies to Anthropic. Anthropic is a company that A. Has cheap unlimited access to its models and B. Is probably largely insulated from the types of tradeoffs that the rest of industry has had to observe in the post-ZIRP era.
Like did they break through the productivity seal? Or are they willing to spend that much more on it since they see their failure as a like existential threat to humanity. I doubt it our boss sees your software the same way.
Why try to disrupt software though?
Isn't this the classic "dev wants to do start-up, has no skills ouside dev, do builds a dev tool" trap?
It doesn't need to be an existential threat to humanity - it's an existential threat to their business. They need agentic workflows to work for their business to become profitable. So pouring money into the "no engineers write code anymore, only agents" model is at once R&D, QA, product development, and advertising. They can spend as much of their investors' money on this as they have to because if they can't (sustainably) sell this vision to other companies, their company collapses.
What is post-ZIRP please :-) ?
Zero interest rate policy. When interest rates are Near zero you can spend money like it’s free. A lot of what we thought of as like normal engineering culture were the result of interest rates being zero.
Tah for that.
> the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT)
Holy crap that is dark. I like learning about ML for fun, and now I have to assume that their model is intentionally misinforming me to sabotage my learning? It is absolutely bananas that somebody decided that was ok behavior.
time to support open source and local models
I don’t see how that helps, unless you actually mean open source, rather than open weights like most people do. Without everything that goes into the model, including training data, these things are opaque.
Actual open source is hard without a big war chest that allows you to flagrantly steal the training data.
That may very well be the case. In fact, I'm nearly certain that you're right. But it doesn't change the fact that open weight models are altogether insufficient on a number of important dimensions regarding freedom and transparency. And so often (such as the comment I replied to, I think), even technical people seem to just ignore the difference. Open weights are just weights. No amount of open-washing changes that.
The raw training data is so large that very few parties could host it for free even if there weren't copyright barriers.
But I think you could have a full open source training software pipeline that's set up to work with Wikipedia, Common Crawl, Books3, Library Genesis, Anna's Archive, and whatever other useful data sets people can name. There would just be a step where you have to provide your own copy of Library Genesis (or whatever subset of it you have managed to obtain).
Someone could write a cyberpunk Three Body Problem with this premise.
They kinda did (though it's more inspired by Trusting Trust than AI)
https://corecursive.com/coding-machines-with-don-and-krystal...
TLDR :-)
This comment is not entirely on point with your comment, it circles around and above it looking for lift though.
If you're not doing work that requires your code to stay in home nation data centres, Claude for Deepseek, Deepclaude (https://github.com/aattaran/deepclaude) is a great way to get better at using Claude like tools for software development. It even does a pretty good job of putting together cover letters for job applications...
Using Deepclaude is very much cheaper than using claude... For hobby projects, I've found it useful. A recipe (for cooking) management app I've made took a couple of hours to put together and cost $US 0.5. Claude is far more expensive.
The downsides of Deepclaude for many are:-
- DeepSeek is a Chinese corporation so the Chinese Communist Party may ask for data if it wants it.
- DeepClaude isn't as fast as normal Claude, though it's still pretty fast and I think fast enough (YMMV).
- DeepClaude might not be as optimised for various code issues that Claude may be able to solve more quickly or effectively.
- The same safeguards are probably on DeepSeek, but you won't be "wasting" as much money as you might on using Claude.
Inference focused hardware (https://www.youtube.com/watch?v=nvPqHoVSenE, AI generated speech) may in the medium future cause a large enough cost/energy reduction for LLM tools like Claude to make local LLMs more attractive.
Inference focused hardware would make running Open Source models like DeepSeek on local machines far cheaper and control over safeguards would return to the end user.
Hopefully this leads to a localised LLM provision market where local businesses provide varieties of these "local" LLM services. Here, local could mean on premise through to state or nationally based LLM services. Eventually, government orgs outside of the US may demand this kind of LLM use, in the same way governments legally require data to be stored within national borders for many critical government functions.
A bloke can dream I guess...
...Could affordable inference focused hardware also cause the bottom to fall out of these stock market bending valuations for AI corps and their datacentre obsessions?... Not to mention the societal costs caused by the AI super corps building these data centres. At the moment, they're nearly making a profit... They seem almost like speculative companies... Is that a term?
Anthropic is full of shit.
[dead]