This is the stealth team I hinted at in a comment on here last week about the "Dark Factory" pattern of AI-assisted software engineering: https://news.ycombinator.com/item?id=46739117#46801848

I wrote a bunch more about that this morning: https://simonwillison.net/2026/Feb/7/software-factory/

This one is worth paying attention to to. They're the most ambitious team I've see exploring the limits of what you can do with this stuff. It's eye-opening.

This right here is where I feel most concerned

> If you haven’t spent at least $1,000 on tokens today per human engineer, your software factory has room for improvement

Seems to me like if this is true I'm screwed no matter if I want to "embrace" the "AI revolution" or not. No way my manager's going to approve me to blow $1000 a day on tokens, they budgeted $40,000 for our team to explore AI for the entire year.

Let alone from a personal perspective I'm screwed because I don't have $1000 a month in the budget to blow on tokens because of pesky things that also demand financial resources like a mortgage and food.

At this point it seems like damned if I do, damned if I don't. Feels bad man.

Yeah, that's one part of this that didn't sit right with me.

I don't think you need to spend anything like that amount of money to get the majority of the value they're describing here.

Edit: added a new section to my blog post about this: https://simonwillison.net/2026/Feb/7/software-factory/#wait-...

This is the part that feels right to me because agents are idiots.

I built a tool that writes (non shit) reports from unstructured data to be used internally by analysts at a trading firm.

It cost between $500 to $5000 per day per seat to run.

It could have cost a lot more but latency matters in market reports in a way it doesn't for software. I imagine they are burning $1000 per day per seat because they can't afford more.

They are idiots, but getting better. Ex: wrote an agent skill to do some read only stuff on a container filesystem. Stupid I know, it’s like a maintainer script that can make recommendations, whatever.

Another skill called skill-improver, which tries to reduce skill token usage by finding deterministic patterns in another skill that can be scripted, and writes and packages the script.

Putting them together, the container-maintenance thingy improves itself every iteration, validated with automatic testing. It works perfectly about 3/4 of the time, another half of the time it kinda works, and fails spectacularly the rest.

It’s only going to get better, and this fit within my Max plan usage while coding other stuff.

LLMs are idiots and they will never get better because they have quadratic attention and a limited context window.

If the tokens that need to attend to each other are on opposite ends of the code base the only way to do that is by reading in the whole code base and hoping for the best.

If you're very lucky you can chunk the code base in such a way that the chunks pairwise fit in your context window and you can extract the relevant tokens hierarchically.

If you're not. Well get reading monkey.

Agents, md files, etc. are bandaids to hide this fact. They work great until they don't.

I wonder if this is just a byproduct of factories being very early and very inefficient. Yegge and Huntley both acknowledge that their experiments in autonomous factories are extremely expensive and wasteful!

I would expect cost to come down over time, using approaches pioneered in the field of manufacturing.

[deleted]

My friend works at Shopify and they are 100% all in on AI coding. They let devs spend as much as they want on whatever tool they want. If someone ends up spending a lot of money, they ask them what is going well and please share with others. If you’re not spending they have a different talk with you.

As for me, we get Cursor seats at work, and at home I have a GPU, a cheap Chinese coding plan, and a dream.

What results are you getting at home?

> If someone ends up spending a lot of money, they ask them what is going well and please share with others. If you’re not spending they have a different talk with you.

Make a "systemctl start tokenspender.service" and share it with the team?

> I have a GPU, a cheap Chinese coding plan, and a dream

Right in the feels

I get $200 a month, I do wish I could get $1000 and stop worrying about trying the latest AI tools.

> No way my manager's going to approve me to blow $1000 a day on tokens, they budgeted $40,000 for our team to explore AI for the entire year.

To be fair, I’ll bet many embracing concerning advice like that have never worked for the same company for a full year.

May be the point is, that the one engineer replaces 10 engineers by using the dark factory which by definition doesn't need humans.

And then he get replaced by a new hire when he asks for a raise.

The great hope of CEOs everywhere.

Same. Feels like it goes against the entire “hacker” ethos that brought me here in the first place. That sentence made me actually feel physically sick on initial read as well. Everyday now feels like a day where I have exponentially less & less interest in tech. If all of this AI that’s burning the planet is so incredible, where are the real world tangible improvements? I look around right now and everything in tech, software, internet, etc. has never looked so similar to a dumpster fire of trash.

Yes, exactly this. My biggest issue is how uncurious the approach seems. Setting a "no-look" policy seems cutting edge for two seconds, but prevents any actual learning about how and why things fail when you have all the details. They are just hamstringing their learning.

We still need to specify precisely what we want to have built. All we know from this post is what they aren't doing and that they are pissing money on LLMs. I want to know how they maintain control and specificity, share control and state between employees, handle conflicts and errors, manage design and architectural choices, etc.

All of this seems fun when hacking out a demo but how in the world does this make sense when there are any outside influences or requirements or context that needs to be considered or workflows that need to be integrated or scaling that needs to occur in a certain way or any of the number of actual concerns that software has when it isn't built in a bubble?

The biggest rewards for human developers came from building addictive eyeball-getters for adverts so I don’t see how we can expect a very high bar for the results of their replacement AI factories. Real-world and tangible just seem completely out of the picture.

Maybe think about it like this: A dev is ~1k per day. If the tool gives you 3x then 2x in cost is fine.

(The current cost of 1k is "real" and ultimately, even if you tinker on your own, you're paying this in opportunity cost)

((caveats, etc))

I read that as combined, up to this point in time. You have 20 engineers? If you haven't spent at least $20k up to this point, you've not explored or experienced enough of the ins and outs to know how best to optimize the use of these tools.

I didn't read that as you need to be spending $1k/day per engineer. That is an insane number.

EDIT: re-reading... it's ambiguous to me. But perhaps they mean per day, every day. This will only hasten the elimination of human developers, which I presume is the point.

I think corporate incentives vs personal incentives are slightly different here. As a company trying to experiment in this moment, you should be betting on token cost not being the bottleneck. If the tooling proves valuable, $1k/day per engineer is actually pretty cheap.

At home on my personal setup, I haven't even had to move past the cheapest codex/claude code subscription because it fulfills my needs ¯\_(ツ)_/¯. You can also get a lot of mileage out of the higher tiers of these subscriptions before you need to start paying the APIs directly.

How is 1k/day cheap? Even for a large company?

Takes like this are just baffling to me.

For one engineer that is ~260k a year.

In big companies there is always waste, it's just not possible to be super efficient when you have tens of thousands of people. It's one thing in a steady state, low-competition business where you can refine and optimize processes so everyone knows exactly what their job is, but that is generally not the environment that software companies operate in. They need to be able innovate and stay competitive, never moreso than today.

The thing with AI is that it ranges from net-negative to easily brute forcing tedious things that we never have considered wasting human time on. We can't figure out where the leverage is unless all the subject matter experts in their various organizational niches really check their assumptions and get creative about experimenting and just trying different things that may never have crossed their mind before. Obviously over time best practices will emerge and get socialized, but with the rate that AI has been improving lately, it makes a lot of sense to just give employees carte blanche to explore. Soon enough there will be more scrutiny and optimization, but that doesn't really make sense without a better understanding of what is possible.

I assumed that they are saying that you spend $1k per day and that makes the developer as productive as some multiple of the number of people you could hire for that $1k.

I do not really agree with the below, but the logic is probably:

1) Engineering investment at companies generally pays off in multiples of what is spent on engineering time. Say you pay 10 engineers $200k / year each and the features those 10 engineers build grow yearly revenue by $10M. That’s a 4x ROI and clearly a good deal. (Of course, this only applies up to some ceiling; not every company has enough TAM to grow as big as Amazon).

2) Giving engineers near-unlimited access to token usage means they can create even more features, in a way that still produces positive ROI per token. This is the part I disagree with most. It’s complicated. You cannot just ship infinite slop and make money. It glosses over massive complexity in how software is delivered and used.

3) Therefore (so the argument goes) you should not cap tokens and should encourage engineers to use as many as possible.

Like I said, I don’t agree with this argument. But the key thing here is step 1. Engineering time is an investment to grow revenue. If you really could get positive ROI per token in revenue growth, you should buy infinite tokens until you hit the ceiling of your business.

Of course, the real world does not work like this.

Right, I understand of course that AI usage and token costs are an investment (probably even a very good one!).

But my point is moreso that saying 1k a day is cheap is ridiculous. Even for a company that expects an ROI on that investment. There’s risks involved and as you said, diminishing returns on software output.

I find AI bros view of the economics of AI usage strange. It’s reasonable to me to say you think its a good investment, but to say it’s cheap is a whole different thing.

Oh sure. We agree on all you said. I wouldn’t call it cheap either. :)

The best you can say is “high cost but positive ROI investment.” Although I don’t think that’s true beyond a certain point either, certainly not outside special cases like small startups with a lot of funding trying to build a product quickly. You can’t just spew tokens about and expect revenue to increase.

That said, I do reserve some special scorn for companies that penny-pinch on AI tooling. Any CTO or CEO who thinks a $200/month Claude Max subscription (or equivalent) for each developer is too much money to spent really needs to rethink their whole model of software ROI and costs. You’re often paying your devs >$100k yr and you won’t pay $2k / yr to make them more productive? I understand there are budget and planning cycle constraints blah blah, but… really?!

Until there's something verifiable it's just talk. Talk was cheap. Now talk has become an order of magnitude cheaper since ChatGPT.

Yet they have produced almost nothing. You can give $10k to couple of college grads and get a better product.

[deleted]

Can you make an ethical declaration here, stating whether or not you are being compensated by them?

Their page looks to me like a lot of invented jargon and pure narrative. Every technique is just a renamed existing concept. Digital Twin Universe is mocks, Gene Transfusion is reading reference code, Semport is transpilation. The site has zero benchmarks, zero defect rates, zero cost comparisons, zero production outcomes. The only metric offered is "spend more money".

Anyone working honestly in this space knows 90% of agent projects are failing.

The main page of HN now has three to four posts daily with no substance, just Agentic AI marketing dressed as engineering insight.

With Google, Microsoft, and others spending $600 billion over the next year on AI, and panicking to get a return on that Capex....and with them now paying influencers over $600K [1] to manufacture AI enthusiasm to justify this infrastructure spend, I won't engage with any AI thought leadership that lacks a clear disclosure of financial interests and reproducible claims backed by actual data.

Show me a real production feature built entirely by agents with full traces, defect rates, and honest failure accounting. Or stop inventing vocabulary and posting vibes charts.

[1] - https://news.ycombinator.com/item?id=46925821

> Every technique is just a renamed existing concept. Digital Twin Universe is mocks, Gene Transfusion is reading reference code, Semport is transpilation. The site has zero benchmarks, zero defect rates, zero cost comparisons, zero production outcomes. The only metric offered is "spend more money".

Repeating for emphasis, because this is the VERY obvious question anyone with a shred of curiosity would be asking not just about this submission but about what is CONSTANTLY on the frontpage these days.

There could be a very simple 5 question questionnaire that could eliminate 90+% of AI coding requests before they start:

- Is this a small wrapper around just querying an existing LLM

- Does a brief summary of this searched with "site:github" already return dozens or hundreds of results?

- Is this a classic scam (pump&dump, etc) redone using "AI"

- Is this needless churn between already high level abstractions of technology (dashboard of dashboards, yaml to json, python to java script, automation of automation framework)

Simon does have a disclosure on his site about not being compensated for anything: https://simonwillison.net/about/#disclosures

Thank you. That link discloses there was at least one instance where OpenAI paid for his time.

I will reformulate my question to ask instead if the page is still 100% correct or needs an update?

It's current. I last modified it in October: https://github.com/simonw/simonwillisonblog/commits/main/tem...

Thank you. Your disclosure page is better than all other AI commentators as most disclose nothing at all. You do disclose an OpenAI payment, Microsoft travel, and the existence of preview relationships.

However I would argue there are significant gaps:

- You do not name your consulting clients. You admit to do ad-hoc consulting and training for unnamed companies while writing daily about AI products. Those client names are material information.

- You have non payments that have monetary value. Free API credits, and weeks of early preview access, flights, hotels, dinners, and event invitations are all compensation. Do you keep those credits?

- The "I have not accepted payments from LLM vendors" could mean receiving things worth thousands of dollars. Please note I am not saying you did.

- You have a structural conflict. Your favorable coverage will mean preview access, then exclusive content then traffic, then sponsors, then consulting clients.

- You appeared in an OpenAI promotional video for GPT-5 and were paid for it. This is influencer marketing by any definition.

- Your quotes are used as third-party validation in press coverage of AI product launches. This is a PR function with commercial value to these companies.

The FTC revised Endorsement Guides explicitly apply to bloggers, not just social media influencers. The FTC defines material connection to include not only cash payments but also free products, early access to a product, event invitations, and appearing in promotional media all of which would seem to apply here.

They also say in the FTC own "Disclosures 101" guide that states [2]: "...Disclosures are likely to be missed if they appear only on an ABOUT ME or profile page, at the end of posts or videos, or anywhere that requires a person to click MORE."

https://www.ftc.gov/business-guidance/resources/disclosures-...

[2] - https://www.ftc.gov/system/files/documents/plain-language/10...

I would argue an ecosystem of free access, preview privileges, promotional video appearances, API credits, and undisclosed consulting does constitute a financial relationship that should be more transparently disclosed than "I have not accepted payments from LLM vendors."

The problem with naming my consulting clients that some of them won't want to be named. I don't want to turn down paid work because I have a popular blog.

I have a very strong policy that I won't write about someone because they paid me to do so, or asked me to as part of a consulting engagement. I guess you'll just have to trust me that I'll hold to that. I like to hope I've earned the trust of most of my readers.

I do have a structural conflict, which is one of the reasons my disclosures page exists. I don't value things like early access enough to avoid writing critically about companies, but the risk of subtle bias is always there. I can live with that, and I trust my readers can live with it too.

I've found myself in a somewhat strange position where my hobby - blogging about stuff I find interesting - has somehow grown to the point that I'm effectively single-handedly running an entire news agency covering the world's most valuable industry. As a side-project.

I could commit to this full-time and adopt full professional journalist ethics - no accepted credits, no free travel etc. I'd still have to solve the revenue side of things, and if I wrote full time I'd give up being a practitioner which would damage my ability to credibly cover the space. Part of the reason people trust me is that I'm an active developer and user of these tools.

On top of that, some people default to believing that the only reason anyone would write anything positive about AI is if they were being paid to do so. Convincing those people otherwise is a losing battle, and I'm trying to learn not to engage.

So I'm OK with my disclosures and principles as they stand. They may not get a 100% pure score from everyone, but they're enough to satisfy my own personal ethics.

I have just added disclosures links to the footer to make them easier to find - thanks for the prod on that: https://github.com/simonw/simonwillisonblog/commit/95291fd26...

The problem with these "shill for an AI company" thoughts is that it really doesn't matter how good their shilling or salesmanship is. They actually do need to provide value for it to be successful

These aren't tools they're asking $25,000 upfront for, that they can trick us that it for sure definitely works and get the huge lump sum then run

Nah.. at best they get a few dollars upfront for us to try it out. Then what? If it doesn't deliver on their promise, it flops

>> at best they get a few dollars upfront for us to try it out.

The hyperscalers are spending 600 billion a year, and literally betting their companies future, on what will happen over the next 24 months...but the bloggers are all doing it for philanthropy and to play with cool tech....Got it...

It doesn't matter

Let's say super popular blogger x is paid a million dollars to shill for AI and they convince you it's revolutionary. What then? Well of course you try it! You pay OpenAI $20 for a month

What happens after that, the actual experience of using the product, is the only important thing. If it sucks and provides no value to anyone, OpenAI fails. Sleezy marketing and salesmen can only get you in the door. They can't make a shit product amazing

A $10,000 get rich quick course can be made successful on hopes, dreams and sales tactics. A monthly subscription tool to help people with their work crashes and burns if it doesn't provide value

It doesn't matter how many people shill for it

This is logical, but it relies on the purchaser being able to evaluate if the tool sucks or not. Each blogger hyping it or advertisement promotes the idea of how automatic, transformative and intelligent these tools are. The decision makers such as execs, VPs, or directors spending begin to lose a clear boundry on what AI is what it can or cant do. So they write the check, rather than miss out, its human nature to follow the pack.

My managers/bosses are non technical so for them watching an agent write python code to scrape a website is like magic because its beyond what they know. And while its not a large upfront cost, it make take a while to see the errors or critical biases in a system one doesnt understand.

So i would argue its more devious because its hard to measure if its really what its marketed to be, but it sure feeeeels like it to less technical people.

this is more about large scale corporate adoption, what you say is true for individual engineers imo

Some of us bloggers have been writing about cool tech for 20+ years already. We didn't need to get paid to do it then, why should we need to be paid now?

Simon Willison has publicly posted many times that he finds it frustrating that people call him a shill for the AI industry

I don't think it's unreasonable to say that your enumerated list would be considered beyond simply being enthusiastic about a new technology

[dead]

It is tempting to be stealthy when you start seeing discontinuous capabilities go from totally random to somewhat predictable. But most of the key stuff is on GitHub.

The moats here are around mechanism design and values (to the extent they differ): the frontier labs are doomed in this world, the commons locked up behind paywalls gets hyper mirrored, value accrues in very different places, and it's not a nice orderly exponent from a sci-fi novel. It's nothing like what the talking heads at Davos say, Anthropic aren't in the top five groups I know in terms of being good at it, it'll get written off as fringe until one day it happens in like a day. So why be secretive?

You get on the ladder by throwing out Python and JSON and learning lean4, you tie property tests to lean theorems via FFI when you have to, you start building out rfl to pretty printers of proven AST properties.

And yeah, the droids run out ahead in little firecracker VMs reading from an effect/coeffect attestation graph and writing back to it. The result is saved, useful results are indexed. Human review is about big picture stuff, human coding is about airtight correctness (and fixing it when it breaks despite your "proof" that had a bug in the axioms).

Programming jobs are impacted but not as much as people think: droids do what David Graeber called bullshit jobs for the most part and then they're savants (not polymath geniuses) at a few things: reverse engineering and infosec they'll just run you over, they're fucking going in CIC.

This is about formal methods just as much as AI.