Are you hand-fixing the issues or having AI do it? I've found that second pass quality is miles away from an initial implementation. If you're experienced, you'll know exactly where the code smells are. Point this out, and the agents will produce a much better implementation in this second pass. And have those people store the promps in the repo! I put my specifications in ./doc/spec/*.md
Every time I got bad results, looking back I noticed my spec was just vague or relied on assumptions. Of course you can't fix your collegues, if they suck they suck and sombody gotta do the mopping :)
I think it would make sense to have these issues bubble up into the public consciousness of hackernews.
I've never used AI to code, I'm a software architect and currently assume I get little value out of an LLM. It would be useful for me if this debate had a vaguely engineering-smelling quality to it, because its currently just two groups shouting at eachother and handwaving criticism away.
If you actually deal with AI generated problems, I love it, please make a post about it so we have something concrete to point to.
PRs where somebody who clearly doesn't know the tech being used well enough, or enough about how the complex app they're working on really works, thus isn't able to determine a good design from a bad one for the feature they're working on, but has AI*-assisted themselves to something which "works", can become an absolute death spiral.
I wasted so much work time trying to steer one of these towards the light, which is very demotivating when design and "why did you do this?" questions are responded to with nothing but another flurry of commits. Even taking the time to fully understand the problem and suggest an alternative design which would fix most of the major issues did nothing (nothing useful must have emerged when that was fed into the coin slot...)
Since I started the review, I ended up becoming the "blocker" for this feature when people started asking why it wasn't landed yet (because I also have my own work to do), to the point where I just hit Approve because I knew it wouldn't work at all for the even more complex use cases I needed to implement in that area soon, so I could just fix/rewrite it then.
From my own experience, the sooner you accept code from an LLM the worse a time you're going to have. If wasn't a good solution or even was the wrong solution from the get-go, no amount of churning away at the code with an LLM will fix it. If you _don't know_ how to fix it yourself, you can't suddenly go from reporting your great progress in stand-ups to "I have nothing" - maybe backwards progress is one of those new paradigms we'll have to accept?
We are talking about a "stupid" tool that parses a google sheet and makes calls to a third-party API
So there is one google sheet per team, with one column per person
One line per day
And each day, someone is in charge of the duty
The tool grabs the data from the sheet and configures pagerduty so that alerts go to the right person
Very basic, no cleverness needed, really straightforward actually
So we have 1 person that wrote the code, with AI. Then we have a second person that checked the code (with AI). Then the shit comes to my desk. To see this kind of cruft:
def create_headers(api_token: str) -> dict:
"""Create headers for PagerDuty API requests.
Args:
api_token: PagerDuty API token.
Returns:
Headers dictionary.
"""
return {
"Accept": "application/vnd.pagerduty+json;version=2",
"Authorization": f"Token token={api_token}",
"Content-Type": "application/json",
}
And then, we have 5 usage like this:
def delete_override(
base_url: str,
schedule_id: str,
override_id: str,
api_token: str,
) -> None:
"""Delete an override from a schedule.
Args:
base_url: PagerDuty API base URL.
schedule_id: ID of the schedule.
override_id: ID of the override to delete.
api_token: PagerDuty API token.
"""
headers = create_headers(api_token)
override_url = f"{base_url}/schedules/{schedule_id}/overrides/{override_id}"
response = requests.delete(override_url, headers=headers, timeout=60)
response.raise_for_status()
No HTTP keep-alive, no TCP reuse, the API key is passed down to every method, so is the API's endpoint. Timeout is defined in each method.
The file is ~800 lines of python code, contains 19 methods and only deals with pagerduty (not google sheet). It tooks 2 fulltime days.
These people fail to produce anything meaningful, this is not really a surprise given their failure to do sane things with such a basic topic
Does AI brings good idea: obviously no, but we knew this.
Does AI improves the quality of the result (regardless of the quality of the idea): apparently no
Does AI improves productivity: again, given this example: no
Are these people better, more skilled or else: no
Try pasting that full code into Claude and prompting:
> No HTTP keep-alive, no TCP reuse, the API key is passed down to every method, so is the API's endpoint. Timeout is defined in each method. Fix all of those issues.
Even in normal human-written code, it's not guaranteed to get the code completely correct in one-shot. That's why code review and QA still exists.
The issue here is more organizational with the engineers not getting the code up to standards before handing off, not the capabilities of the AI itself.
I think I spend too much time at work fixing the greatness of AI.
Are you hand-fixing the issues or having AI do it? I've found that second pass quality is miles away from an initial implementation. If you're experienced, you'll know exactly where the code smells are. Point this out, and the agents will produce a much better implementation in this second pass. And have those people store the promps in the repo! I put my specifications in ./doc/spec/*.md
Every time I got bad results, looking back I noticed my spec was just vague or relied on assumptions. Of course you can't fix your collegues, if they suck they suck and sombody gotta do the mopping :)
I think it would make sense to have these issues bubble up into the public consciousness of hackernews.
I've never used AI to code, I'm a software architect and currently assume I get little value out of an LLM. It would be useful for me if this debate had a vaguely engineering-smelling quality to it, because its currently just two groups shouting at eachother and handwaving criticism away.
If you actually deal with AI generated problems, I love it, please make a post about it so we have something concrete to point to.
PRs where somebody who clearly doesn't know the tech being used well enough, or enough about how the complex app they're working on really works, thus isn't able to determine a good design from a bad one for the feature they're working on, but has AI*-assisted themselves to something which "works", can become an absolute death spiral.
I wasted so much work time trying to steer one of these towards the light, which is very demotivating when design and "why did you do this?" questions are responded to with nothing but another flurry of commits. Even taking the time to fully understand the problem and suggest an alternative design which would fix most of the major issues did nothing (nothing useful must have emerged when that was fed into the coin slot...)
Since I started the review, I ended up becoming the "blocker" for this feature when people started asking why it wasn't landed yet (because I also have my own work to do), to the point where I just hit Approve because I knew it wouldn't work at all for the even more complex use cases I needed to implement in that area soon, so I could just fix/rewrite it then.
From my own experience, the sooner you accept code from an LLM the worse a time you're going to have. If wasn't a good solution or even was the wrong solution from the get-go, no amount of churning away at the code with an LLM will fix it. If you _don't know_ how to fix it yourself, you can't suddenly go from reporting your great progress in stand-ups to "I have nothing" - maybe backwards progress is one of those new paradigms we'll have to accept?
Here is a sample
We are talking about a "stupid" tool that parses a google sheet and makes calls to a third-party API
So there is one google sheet per team, with one column per person
One line per day
And each day, someone is in charge of the duty
The tool grabs the data from the sheet and configures pagerduty so that alerts go to the right person
Very basic, no cleverness needed, really straightforward actually
So we have 1 person that wrote the code, with AI. Then we have a second person that checked the code (with AI). Then the shit comes to my desk. To see this kind of cruft:
And then, we have 5 usage like this: No HTTP keep-alive, no TCP reuse, the API key is passed down to every method, so is the API's endpoint. Timeout is defined in each method. The file is ~800 lines of python code, contains 19 methods and only deals with pagerduty (not google sheet). It tooks 2 fulltime days.These people fail to produce anything meaningful, this is not really a surprise given their failure to do sane things with such a basic topic
Does AI brings good idea: obviously no, but we knew this. Does AI improves the quality of the result (regardless of the quality of the idea): apparently no Does AI improves productivity: again, given this example: no Are these people better, more skilled or else: no
Am I too demanding ? Am I asking too much ?
Try pasting that full code into Claude and prompting:
> No HTTP keep-alive, no TCP reuse, the API key is passed down to every method, so is the API's endpoint. Timeout is defined in each method. Fix all of those issues.
AI is a wonderful tool that will answer all of your questions, as long as you give it the right answer ? That's probably right.
Even in normal human-written code, it's not guaranteed to get the code completely correct in one-shot. That's why code review and QA still exists.
The issue here is more organizational with the engineers not getting the code up to standards before handing off, not the capabilities of the AI itself.
I'm sorry your teammates have skill issues when it comes to using these tools.