> I can’t imagine any other example where people voluntarily move for a black box approach.
Anyone overseeing work from multiple people has to? At some point you have to let go and trust people‘s judgement, or, well, let them go. Reading and understanding the whole output of 9 concurrently running agents is impossible. People who do that (I‘m not one of them btw) must rely on higher level reports. Maybe drilling into this or that piece of code occasionally.
>At some point you have to let go and trust people‘s judgement.
Indeed. People. With salaries, general intelligence, a stake in the matter and a negative outcome if they don’t take responsibility.
>Reading and understanding the whole output of 9 concurrently running agents is impossible.
I agree. It is also impossible for a person to drive two cars at once… so we don’t. Why is the starting point of the conversation that one should be able to use 9 concurring agents?
I get it, writing code no longer has a physical bottleneck. So the bottleneck becomes the next thing, which is our ability to review outputs. It’s already a giant advancement, why are we ignoring that second bottleneck and dropping quality assurance as well? Eventually someone has to put their signature on the thing being shippable.
Is reviewing outputs really more efficient than writing the code? Especially if it's a code base you haven't written code in?
It is not. To review code you need to have an understanding of the problem that can only be built by writing code. Not necessarily the final product, but at least prototypes and experiments that then inform the final product.
An AI agent cannot be held accountable
Neither can employees, in many countries.
> Anyone overseeing work from multiple people has to?
That's not a black box though. Someone is still reading the code.
> At some point you have to let go and trust people‘s judgement
Where's the people in this case?
> People who do that (I‘m not one of them btw) must rely on higher level reports.
Does such a thing exist here? Just "done".
> Someone is still reading the code.
But you are not. That’s the point?
> Where's the people in this case?
Juniors build worse code than codex. Their superiors also can‘t check everything they do. They need to have some level of trust for doing dumb shit, or they can’t hire juniors.
> Does such a thing exist here? Just "done".
Not sure what you mean. You can definitely ask the agent what it built, why it built it, and what could be improved. You will get only part of the info vs when you read the output, but it won’t be zero info.
You: "Why did you build this?"
LLM: "Because the embeddings in your prompt are close to some embeddings in my training data. Here's some seemingly explanatory text with that is just similar embeddings to other 'why?' questions."
You: "What could be improved?"
LLM: "Here's some different stuff based on other training data with embeddings close to the original embeddings, but different.
---
It's near zero useful information. Example imformation might be "it builds" (baseline necessity, so useless info), "it passes some tests" (fairly baseline, more useful, but actually useless if you don't know what the tests are doing), or "it's different" (duh).
If I asked you, about a piece of code that you didn’t build, „What would you improve?“, how would that be fundamentally different?
> You can definitely ask the agent what it built, why it built it, and what could be improved.
If that was true we’d have what they call AGI.
So no, it doesn’t actually give you those since it can’t reason and logic in such a way.
What does that have to do with AGI?
What you asked for is AGI. How else does it think, reason and logic to answer your "why"?
It doesn't do that currently even if you think it does.