(I’m one of the people on this team). I joined fresh out of college, and it’s been a wild ride.

I’m happy to answer any questions!

More of a comment than a question:

> Those of us building software factories must practice a deliberate naivete

This is a great way to put it, I've been saying "I wonder which sacred cows are going to need slaughtered" but for those that didn't grow up on a farm, maybe that metaphor isn't the best. I might steal yours.

This stuff is very interesting and I'm really interested to see how it goes for you, I'll eagerly read whatever you end up putting out about this. Good luck!

EDIT: oh also the re-implemented SaaS apps really recontextualizes some other stuff I’ve been doing too…

This was an experiment that Justin ran: one person fresh out of college, and another with a long, traditional career.

Even though all three of us have very different working styles, we all seem to be very happy with the arrangement.

You definitely need to keep an open mind, though, and be ready to unlearn some things. I guess I haven’t spent enough time in the industry yet to develop habits that might hinder adopting these tools.

Jay single-handedly developed the digital twin universe. Only one person commits to a codebase :-)

> "I wonder which sacred cows are going to need slaughtered"

Or a vegan or Hindu. Which ethics are you willing to throw away to run the software factory?

I eat hamburgers while aware of the moral issues.

I’ve been building using a similar approach[1] and my intuition is that humans will be needed at some points in the factory line for specific tasks that require expertise/taste/quality. Have you found that the be the case? Where do you find that humans should be involved in the process of maximal leverage?

To name one probable area of involvement: how do you specify what needs to be built?

[1] https://sociotechnica.org/notebook/software-factory/

You're absolutely right ;)

Your intuition/thinking definitely lines up with how we're thinking about this problem. If you have a good definition of done and a good validation harness, these agents can hill climb their way to a solution.

But you still need human taste/judgment to decide what you want to build (unless your solution is to just brute force the entire problem space).

For maximal leverage, you should follow the mantra "Why am I doing this?" If you use this enough times, you'll come across the bottleneck that can only be solved by you for now. As a human, your job is to set the higher-level requirements for what you're trying to build. Coming up with these requirements and then using agents to shape them up is acceptable, but human judgment is definitely where we have to answer what needs to be built. At the same time, I never want to be doing something the models are better at. Until we crack the proactiveness part, we'll be required to figure out what to do next.

Also, it looks like you and Danvers are working in the same space, and we love trading notes with other teams working in this area. We'd love to connect. You can either find my personal email or shoot me an email at my work email: navan.chauhan [at] strongdm.com

You aren't supposed to read code, but do you from time to time, just to evaluate what is going on?

No. But, I do ask questions (in $CODING_AGENT to always have a good mental model of everything that I’m working on though.

Is it essentially using LLMs as a compiler for your specs?

What do you do if the model isn't able to fulfill the spec? How do you troubleshoot what is going on?

Using models to go from spec to program is one use case, but it’s not the whole story. I’m not hand-writing specs; I use LLMs to iteratively develop the spec, the validation harness, and then the implementation. I’m hands-on with the agents, and hands-off with our workflow style we call Attractor

In practice, we try to close the loop with agents: plan -> generate -> run tests/validators -> fix -> repeat. What I mainly contribute is taste and deciding what to do next: what to build, what "done" means, and how to decompose the work so models can execute. With a strong definition of done and a good harness, the system can often converge with minimal human input. For debugging, we also have a system that ingests app logs plus agent traces (via CXDB).

The more reps you get, the better your intuition for where models work and where you need tighter specs. You also have to keep updating your priors with each new model release or harness change.

This might not have been a clear answer, but I am happy to keep clarifying as needed!

I know you're not supposed to look at the code, but do you have things in place to measure and improve code quality anyway?

Not just code review agents, but things like "find duplicated code and refactor it"?

A few overnight “attractor” workflows serve distinct purposes:

* DRYing/Refactoring if needed

* Documentation compaction

* Security reviews