Your tone makes me think you have already fallen in love with agents and you think they are the best thing since sliced bread, but let me give you my experience.
I am in a similar professional position to you, and I make a lot of small things in my spare time. I have found using agents very tedious and frustrating to workflow. Initial prototyping can be ok, but when you start to get serious with code it falls apart quickly. If you don't tell the agent literally exactly what to do to the letter, it will guess some things. Usually some of those things are wrong, and dont match the functionality you expected. I find this a very frustrating place to be, trying to tell the agent what is the wrong functionality, and what I expect instead. Usually at this point I enter what I refer to as a doom spiral, where everything I tell the agent just takes me further from what I want, until I eventually have to revert everything it has done and try again.
This gets worse with bugs, where a inevitably a code bug will appear, and trying to tell the agent what the bug is and what is expected instead usually results in more broken functionality elsewhere. When I have written the codebase manually myself, I can usually pinpoint and fix bugs in a few minutes after diagnosing them. I have literally spent hours trying to get an agent to fix a bug without breaking something else.
I thought maybe refactoring code might be a strong point for LLMs, so I tested taking a monolith codebase and asked various agents to refactor into reusable module structures with exposed api endpoints so that I could split apart functions into modular chunks whilst retaining full functionality. They all failed miserably at this, breaking everything and never managing to make a working example.
LLMs and their agents certainly are cool, and they are great at writing emails for people and summarising meeting notes. They can even create very small coded programs well. But let loose on serious production codebases and they can cause much more frustration than they solve. I will come back and try another day when LLMs have evolved again to the next level, but for now they can stay coding my toy projects and dictating my teams meeting notes.
My general experience is that LLMs are both really good and extremely bad. It's so easy to get into a hole of "No, not like that, like this" and it just never getting better (including with new sessions).
I find it fascinating the wildly different experiences people have with LLMs, and honestly I think it's a good thing. We will need code crafters and technomancers, I don't think having either one or the other is healthy, which is why I'm very critical of mandatory LLM use in corporations.
And I don't doubt you have had you agro with LLMs, because I've also had my fair share of issues with them, I just think we have different emotional responses to the workflow with agents. They don't work the first time and they aren't very good at sweeping large sets of loosely related changes. They need to focus on one feature only and crunch it to the end.
Honestly though I've didn't have the chance to work in large codebases, but with those projects I had lots of success and I found the workflow very stimulating, reading the solutions the LLM come up with, some very interesting and some comically bad, but more often than not I'll pick up a technique or an approach I didn't think about. Worse case it's something I can bounce ideas off of.
About bugs, I have the opposite impression. I find it really interesting to get a functionality wrong, provide the agent with the logs and context and explain in detail the issue and have it help me explore the codebase to identify and fix the issue. I've never had an instance until now that I couldn't fix the bug or that I left the session in a worse mental state than I entered.
I'll take buzz, for instance. Before using zurg I had to use Plex because jellyfin would only detect a single file in a folder with multiple files. Codex created the presentation layer I described in a single go and it worked first time. That was really impressive I have to say. The project also has it's own WebDAV server, it integrated with debrid, has a persistent catalogue of media that is independent of debrid and can be used to restore previously deleted media. It has a logging UI, a config UI and a nice event system for waiting for different independent services that it needs to orchestrate. I don't think it's a large codebase, but it's nowhere near a toy project. It also has a very capable CI pipeline that supports the development. The only part I couldn't get the agent to do well for nothing was frontend implementation, maybe because I refused to use a framework and defaulted to plain JavaScript and CSS embedded in jinja2 templated html files. I have picked up a couple of techniques when I did full stack work when I was an intern so I was cabaple of using the browser to inspect and refine the Dom elements. One thing that it did poorly for instance was to create all elements in block display, however planning a refactor to use flexbox throughout the code really improved the UI resilience and it was really effortless to deploy. In buzz I haven't touch most of the code, just some adjustments in the htmls to serve as an example for the agent of how to do it correctly, prompts not being the only way to interact with them, but I read most of the code and validated most of the functionality in merge requests, just like you'd do in a team work.
In a nutshell I think agents are really capable since November last year of working in large code bases, but I don't trust them to just be let loose. They need lots of hand holding and steering, but for me once I got the hang of it I really feel like I'm extremely productive.
My hypothesis is that people are more likely to have success with agents the more they enjoy writing in natural language and reading code, while people that prefer coding and dislike writing text will usually prefer handcrafting their programs.