> Why is it still hard to take an existing fully concrete specification, and an existing test suite, and dump out a working feature-complete port of huge, old, and popular projects? Lots of stuff like this will even be in the training

We have a smaller version of that ability already:

- https://simonwillison.net/2025/Dec/15/porting-justhtml/

See also https://www.dbreunig.com/2026/01/08/a-software-library-with-...

I need to write these up properly, but I pulled a similar trick with an existing JavaScript test suite for https://github.com/simonw/micro-javascript and the official WebAssembly test suite for https://github.com/simonw/pwasm

So extrapolating from here and assuming applications are as easy as libraries, operating systems are as easy as applications.. at this rate with a few people in a weekend you can convert anything to anything else, and the differences between different programming languages are very nearly effectively erased. Nice!

And yet it doesn't feel true yet, otherwise we'd see it. Why do you think that is?

Because it's not true yet. You can't convert anything to anything else, but you CAN get good results for problems that can be reduced to a robust conformance suite.

(This capability is also brand new: prior to Claude Opus 4.5 in November I wasn't getting results from coding agents that convinced me they could do this.)

It turns out there are some pretty big problems that works for, like HTML5 parsers and WebAssembly runtimes and reduced-scoped JavaScript language interpreters. You have to be selective though. This won't work for Linux.

I thought it wouldn't work for web browsers either - one of my 2026 predictions was "by 2029 someone will build a new web browser using mostly LLM-code"[1] - but then I saw this thread on Reddit https://www.reddit.com/r/Anthropic/comments/1q4xfm0/over_chr... "Over christmas break I wrote a fully functional browser with Claude Code in Rust" and took a look at the code and it's surprisingly deep: https://github.com/hiwavebrowser/hiwave

[1] https://simonwillison.net/2026/Jan/8/llm-predictions-for-202...

> you CAN get good results for problems that can be reduced to a robust conformance suite.

If that's what is shown then why doesn't it work on anything that has a sufficiently large test-suite, presumably scaling linearly in time with size? Why should we be selective, and based on what?

It probably does. This only become possible over the last six weeks, and most people haven't yet figured out the pattern.