A workable C compiler is a ~10-50KLOC program, and a fairly simple one at that (batch, with no concurrency or interaction). That Anthropic's swarm of agents wrote 100KLOC before failing is a symptom of the problem. It's certainly possible that many programs are in the sub 5KLOC range, but it's definitely not "most software". Plus, almost no software has this level of detailed spec, ready-made tests, and a selection of existing implementations of the same spec.
My first thought when reading Anthropic's description of the experiment was that it is unrealistically easy. It's hard to come up with realistic jobs in the 10-50KLOC range that would be this easy for an LLM. That it failed only shows how much further we still have to go.
A bit off topic, but see how Anthropic publicity stunts went from "Claude C Compiler" with 100K LOC to the recent Bun Rust rewrite with 1M LOC (10x!) in just 3 months.
I get that it's "novel" creation vs porting, but given that they reported that the C compiler cost them $20k in API costs, the Bun rewrite must be at least $200k, maybe even closer to a million. Pure madness.
Asking an LLM tp change programming language of an implementation is completely different from asking it to code from spec. It's orders of magnitude simpler in practice. I converted some 60kloc of Java to C++ and it works. There were some issues where the Java implementation used runtime reflection because that needs creative workarounds and not all of the C++ translations worked on the first try. And that was my first serious attempt at a task with an LLM. I could likely do better now. An important task simplification here is that a well designed codebase can be converted in small pieces and then joined back together. So the total amount of code converted becomes an irrelevant metric.
Yes, the task is very different, but also it will be months to a year until we know the results of the bun experiment.
I don't know how it could fail - Bun loses popularity among devs? Is it an objective metric? From what I understand, Node.js remains dominant across the industry as a whole, with Deno and Bun mostly used by startups.
Anthropic can always fire the Opus/Mythos token machine gun on any problem (bugs, features, security) to ensure PR success, and there would be plenty of AI-sphere startups already drinking the kool-aid that would consider the whole vibe-coding thing to Bun's benefit.
> Anthropic can always fire the Opus/Mythos token machine gun on any problem (bugs, features, security) to ensure PR success,
Can they, though? They tried and failed to do it in their C compiler experiment. The experimenter wrote: "I tried (hard!) to fix several of the above limitations but wasn’t fully successful. New features and bugfixes frequently broke existing functionality."
It could fail due to maintenance burden. There is a lot of code now that no one wrote.
Are we assuming, all tests pass == software done?
Do Firefox not have tests? Then how was there over 200 CVEs found?
Are we going to be comfortable running a piece of software that has 1M lines, and who knows how many zero-days will be in it.
Yes, sure they are going to use LLM to find the CVE's, and so will the hackers. You need a day or two to fix the security issue, a hacker just need to put it in use.
And good luck debugging a million line code base.
1M LOC == already failed.
The compiler that claude made went way beyond workable. It could compile the full linux kernel afaik. That is much further even beyond standard C.
People who independently tried to use it reported that it is very much not workable:
- "CCC compiled every single C source file in the Linux 6.9 kernel without a single compiler error (0 errors, 96 warnings). This is genuinely impressive for a compiler built entirely by an AI. However, the build failed at the linker stage with ~40,784 undefined reference errors."(https://github.com/harshavmb/compare-claude-compiler)
- Overall it’s an interesting experiment, and shows the current bleeding edge of Claude’s Opus 4.6 model. However the resulting product is also a clear example of the throwaway nature of projects generated almost entirely by AI code agents with little human oversight. The prototype is really impressive, but there is no real path forward for it to be further developed. It can build the Linux kernel [for RISC-V], which is impressive. It can also build other things… if you are lucky, but you really cannot rely on it to work. (https://voxelmanip.se/2026/02/06/trying-out-claudes-c-compil...)
Anthropic themselves said that the codebase was effectively bricked and that their agents could not salvage it.
Well then as you say a 10-50KLOC C compiler is workable. Could you show me the C compiler that does manage to compile a modern Linux kernel that is of that size?
TCC did several years ago. It could boot Linux from source in under 10 seconds. It's wasn't that big of a C compiler. It's in the 50,000 lines of code range.
This was 20 years ago from what I can find. Beside that Linux now is a vastly different codebase than it was 20 years ago. That effort also did not compile Linux unmodified, it required several changes: https://bellard.org/tcc/tccboot_readme.html.