I'd love to hear some commentary about my idea surrounding this problem of AI PRs.
Why not restrict the agents to writing tests only?
If the tickets are written concisely, any feature request or fix could be reduced to necessary spec files.
This way, any maintainer would be tasked with reviewing the spec files and writing the implementation.
CI is pretty good at gatekeeping based on test suites passing...
Quis custodiet ipsos custodes?
If the problem is that we don't trust people who use AI without understanding its output, and we base the gate-keeping on tests that are written on AI, then how can we trust that output?
Isn't that the purpose of red/green refactoring though? To establish working software that expresses regression, and builds trust (in the software)?
If your premise is that people would shift to using AI to write tests they don't understand, then that's not necessarily a failing of the contributor.
The contributor might not understand the output, but the maintainer would be able to critique a spec file and determine pretty quickly if implementation would be worthwhile.
This would necessitate a need for small tickets, thereby creating small spec files, and easier review by maintainers.
Also, any PR that included a non spec file could be dismissed patently.
It is possible for users of AI to learn from reading specs.
But if agents are doing the entire thing (reading the ticket, generating the PR, submitting the PR)...then the point of people not understanding is moot.
From my experience, you can't trust the agent to do the entire thing unless you set up very heavy linters, quality control systems (e.g. SonarQube) and a long etc. of things because AI tends to produce pretty bad code: repetition, unused code, lack of structure... basically all the things that we've spent decades learning not to do. And then there is the point where you get a pretty obscure bug that you can only solve if you have a deep understanding of the code which you won't have because you delegated that to an agent.
I like agentic programming, I use it, but I review everything that the agent does and frequently spend a few cycles simply telling the agent to refactor the code because it constantly produces technical debt.
That's not helpful, because:
1: LLMs can write awful tests
2: LLMs can write very useful code, especially when they are working in well-understood areas.
Which comes down to understanding that the LLM is a tool, and it's the job of the programmer to know how to use the LLM and evaluate its output.