> Testing workloads that take hours to run still take hours to run with either a human or LLM testing them out (aka that is still the bottleneck)
Actually I had some terrible experiences when asking the agent to do something simple in our codebase (like, rename these files and fix build scripts and dependencies) but it spent much longer time than a human, because it kept running the full CI pipelines to check the problems after every attempted change.
A human would, for example, rely on the linter to detect basic issues, run a partial build on affected targets, etc. to save the time. But the agent probably doesn't have a sense of time elapsed.
Went through something similar recently with database calls.
Co-pilot said something about having too many rows returned and had some complex answer on how to reduce row count.
I just added a "LIMIT 100" which was more than adequate.
Can't this be solved with something like "Don't run any CI commands" in the AGENTS.md?
Except for the times you do want it to run the CI.
LLM issues can often be solved by being more and more specific, but at some point being specific enough is just as time consuming as jumping in and doing it yourself.