This reads like it could result in "the blind, leading the blind". Unless the Supervisor AI agents are deterministic, it can still be a crapshoot. Given the resources that SourceGraph has, I'm still surprised they missed the most obvious thing, which is "context is king" and we need tooling that can make adding context to LLMs dead simple. Basically, we should be optimizing for the humans in the loop.

Agents have their place for trivial and non-critical fixes/features, but the reality is, unless the agents can act in a deterministic manner across LLMs, you really are coding with a loaded gun. The worst is, agents can really dull your senses over time.

I do believe in a future where we can trust agents 99% of the time, but the reality is, we are not training on the thought process, for this to become a reality. That is, we are not focused on the conversation to code training data. I would say 98% of my code is AI generated, and it is certainly not vibe coding. I don't have a term for it, but I am literally dictating to the LLM what I want done and have it fill in the pieces. Sometimes it misses the mark, sometimes it aligns and sometimes it introduces whole new ideas that I have never thought of, which will lead to a better solution. The instructions that I provide is based on my domain knowledge and I think people are missing the mark when they talk about vibe coding, in a professional context.

Full Disclosure: I'm working on improving the "conversation to code" process, so my opinions are obviously biased, but I strongly believe we need to first focus on better capturing our thought process.

I'm skeptical that we would need determinism in a supervisor in order for it to be useful. I realize it's not exactly analogous, but the current human parallel, with senior/principal/architect-level SWEs reviewing code from less experienced devs (or even similarly-/more-experienced devs) is far from deterministic, but certainly improves quality

Think about how differently a current agent behaves when you say "here is the spec, implement a solution" vs "here is the spec, here is my solution, make refinements" - you get very different output, and I would argue that the 'check my work' approach tends to have better results.