But is it really any faster than using an already existing code generator/scaffolding tool? How do you know your project isn’t just a regurgitation of another repository? Would it be just as fast to clone some existing project and hack on it?
These are the questions everyone seems to be ignoring and saying “only LLMs can make projects quickly” but ignoring everything those LLMs are built on (your llmis probably calling a code gen tool).
For the at work side, I personally haven’t experienced any disadvantages or missed any project deadlines because I didn’t use an LLM, so what does velocity get me? Thumb twiddling time?
It reminds me of Drupal circa 2009.
I was thinking the other day how much better Drupal is. Want a online store? A few commands and bam, online store. Want a newspaper? A few commands and bam, newspaper with publishing workflows, user management, and caching.
Using coding agents isn't much different. There are several things the models are trained to do very well and a few commands will get something. If the developer wants to move the project beyond that, it requires domain knowledge and a lot of hacking.
I wonder if the coding agents will move towards the Drupal model where they create interchangeable components with common interfaces. Like Drupal the coding agents never provide anything truly inovative that hasn't been done before.
> If the developer wants to move the project beyond that, it requires domain knowledge and a lot of hacking.
Reminds me a bit of this blog post[0].
I remember doing a Drupal project around that time and being astonished at how powerful it was.
I also remember feeling more like a technician connecting various components than like a software engineer, writing code.
I totally saw the value for the client but I really disliked my experience, so I avoided it afterwards.
0: https://www.rickmanelius.com/p/the-website-rfp-and-the-impos...
Drupal and WP etc all have plugins to switch stuff on in minutes, however, customising and making it as your client wants would take a lot of time. WP shops we work with for clients (we need to integrate some times) take weeks to get some plugin to do what they want by adding tags and config options.
It might centralize around a specific framework but I think part of the problem is that people want to generate their own framework or at least not care about what the framework is/does/can do. They treat the LLM as the framework which can be non-deterministic and structureless.
> But is it really any faster than using an already existing code generator/scaffolding tool?
Yes, very much so. Our team was fast with those tools and created many of our own before this LLM AI (we used other AIs though to go faster), however it still took weeks to months from idea to launch; the same complexity now takes days, including everything. We already had rigorous processes and those really help now moving at speed. No way anyone can beat this except better AI.
But “are you really moving at speed after you generate the majority of your application?” is my other point. If you were to start working somewhere with an existing product the changes you would apply are more than likely incremental. What is the advantage of using LLMs to change 1-10 lines of code on average? How do you measure the ROI for that?
What did the time savings gain you? A quicker release date? How can you prove that? “This would have taken weeks” is the old problem of project time estimation. How can I take any engineer seriously that they think they know it saved weeks?
Considering that engineer never reliably estimated anything beyond a few days remotely accurately before… but now they can…
Yea cause it's done while you're still reading the docs for your code generator. lol
Code generators are usually one short command. It’s less typing than a prompt would take.
That statement tells me you have less experience with code generators than me and I'll just leave it at that.
Then how much time do you spend debugging and fixing the generated code? lol
Usually I know exactly what I want before hand. What structs. What protocols. How I want the event bus layered and what threads need to exist. And what make targets I want. So generally the generated code is strictly bound to my design pattern. Then it's all a matter of running it. To put it bluntly I'm running benchmarks and testing it while you're still deciding what to name your files.
And? What advantage does that have for you over me when it comes to a personal project? What does that velocity get me? More time to foolishly rewrite/regenerate the already built software from scratch? I don’t spend a lot of time naming my files personally.
So you do no validation of the code that’s generated? Just asking because you didn’t state that as a step in your process. You’re prototyping to running then you’re missing a big step that will most likely cost you later.
Why does it matter how much time I spent writing code for a project I’m most likely either not sharing or if I am sharing it can be obtained for free? Which market am I rushing to? Bluntness doesnt seem to be an advantage other than bragging.
Your tone makes me think you already decided that agents aren't worth your time, but I'll give it a try anyways.
I work as a DevOps engineer and have been using agents exclusively to code since the beginning of the year. Agents are really nice to quickly craft utilities to speed up planning. For instance I had it create a small cli for me that'll pull my cards from azure DevOps, load them as json, markdown and csv, and push updates once I'm done. Then I'll load into context transcripts of meetings and other written requirements, cross with current state of repos, to have meaningfully conrextualized work items without me having to implement these myself. I'll just have a long chat with the agent exploring these cards and defining the necessary refinements for description and acceptance criteria than I jusr push them all at once. Anything you can think of you just ask for the agent, so for me I don't trust code, so I'll have all my clis be no-op by default, so they will first print all they'll do and if I think the changes make sense I approve them and let the script commit to the canonical board.
Working with cloud consoles like Aws in general is a huge hassle, so crafting quick inventory utilities and tools for correlating data is a breeze.
Now the work itself is mainly ci pipelines, terraform files and automation. For these I'll base the agents on the specified work items and enrich them with my own understanding of the problem. I then launch the agents and read the agent output attentively. This is very important. You can't just prompt and leave, you need to be present all the time so you can steer the agent into solving the right problems. At the very least you need to review all the changes after an implementation session is done when you came back from making coffee. Many times it tries to create meaningless abstractions or very complicated solutions that I know can be done better. Or I have a different idea of how to organize the project so I do many follow-up sessions to refactor code.
In my personal projects I do a lot of small utilities. I spent some weeks designing and polishing a replacement for zurg and debridmediamanager the way I like it to be, simple and to the point, also tightly integrating them with jellyfin https://gitlab.com/gabriel.chamon/buzz
I have my own micro desktop environment on top of hyprland called Archie which recently I've been redesigning and improving a lot with agents https://gitlab.com/gabriel.chamon/archie
I have my own agile based methodology for creating and managing work items with tight integration with gitlab https://gitlab.com/gabriel.chamon/orisun
I have been improving my fork if gamma-launcher so that installing and managing the game on bazzite is simpler and more automated than relying on workarounds for workflows intended for windows https://gitlab.com/gabriel.chamon/gamma-launcher
Now for how I approach developing with agents. I think it's really important to get your constraints sorted out as soon as possible, so have your agent create a CI pipeline for code quality testing, like with ruff, pyright and pytest, to control style, code consistency and cyclomatic complexity. Put in the AGENTS.md explicit instructions that the agent must run these tools at the end of every coding session. If adopting a new project, use the agent to explore the code and see which refactoring points are worth tackling. Agents really thrive on good codebases, so this first code quality improvement pass is a must.
To sum it up, with agents you give up writing code manually for reading lots of code, exploring the domain with the help of the agent and architecting the solution at a strategic level. You trust the agent but you also verify. And lots and lots of manual testing. My personal take is that I'm infinitely productive now, only constrained by how much code and agent terminal output I can read, and also by the rate limits of the model providers and mental fatigue.
> Agents are really nice to quickly craft utilities to speed up planning.
Reminds me of a conversation I had with Kelsey Hightower where he suggested that using agents to build utilities and software was a smarter way to proceed than using agents to do the work. It is almost like the software artifacts are a cached version of your understanding of the problem and can be used over and over again until the problem (or your understanding of it) changes.
Your tone makes me think you have already fallen in love with agents and you think they are the best thing since sliced bread, but let me give you my experience.
I am in a similar professional position to you, and I make a lot of small things in my spare time. I have found using agents very tedious and frustrating to workflow. Initial prototyping can be ok, but when you start to get serious with code it falls apart quickly. If you don't tell the agent literally exactly what to do to the letter, it will guess some things. Usually some of those things are wrong, and dont match the functionality you expected. I find this a very frustrating place to be, trying to tell the agent what is the wrong functionality, and what I expect instead. Usually at this point I enter what I refer to as a doom spiral, where everything I tell the agent just takes me further from what I want, until I eventually have to revert everything it has done and try again.
This gets worse with bugs, where a inevitably a code bug will appear, and trying to tell the agent what the bug is and what is expected instead usually results in more broken functionality elsewhere. When I have written the codebase manually myself, I can usually pinpoint and fix bugs in a few minutes after diagnosing them. I have literally spent hours trying to get an agent to fix a bug without breaking something else.
I thought maybe refactoring code might be a strong point for LLMs, so I tested taking a monolith codebase and asked various agents to refactor into reusable module structures with exposed api endpoints so that I could split apart functions into modular chunks whilst retaining full functionality. They all failed miserably at this, breaking everything and never managing to make a working example.
LLMs and their agents certainly are cool, and they are great at writing emails for people and summarising meeting notes. They can even create very small coded programs well. But let loose on serious production codebases and they can cause much more frustration than they solve. I will come back and try another day when LLMs have evolved again to the next level, but for now they can stay coding my toy projects and dictating my teams meeting notes.
My general experience is that LLMs are both really good and extremely bad. It's so easy to get into a hole of "No, not like that, like this" and it just never getting better (including with new sessions).
I find it fascinating the wildly different experiences people have with LLMs, and honestly I think it's a good thing. We will need code crafters and technomancers, I don't think having either one or the other is healthy, which is why I'm very critical of mandatory LLM use in corporations.
And I don't doubt you have had you agro with LLMs, because I've also had my fair share of issues with them, I just think we have different emotional responses to the workflow with agents. They don't work the first time and they aren't very good at sweeping large sets of loosely related changes. They need to focus on one feature only and crunch it to the end.
Honestly though I've didn't have the chance to work in large codebases, but with those projects I had lots of success and I found the workflow very stimulating, reading the solutions the LLM come up with, some very interesting and some comically bad, but more often than not I'll pick up a technique or an approach I didn't think about. Worse case it's something I can bounce ideas off of.
About bugs, I have the opposite impression. I find it really interesting to get a functionality wrong, provide the agent with the logs and context and explain in detail the issue and have it help me explore the codebase to identify and fix the issue. I've never had an instance until now that I couldn't fix the bug or that I left the session in a worse mental state than I entered.
I'll take buzz, for instance. Before using zurg I had to use Plex because jellyfin would only detect a single file in a folder with multiple files. Codex created the presentation layer I described in a single go and it worked first time. That was really impressive I have to say. The project also has it's own WebDAV server, it integrated with debrid, has a persistent catalogue of media that is independent of debrid and can be used to restore previously deleted media. It has a logging UI, a config UI and a nice event system for waiting for different independent services that it needs to orchestrate. I don't think it's a large codebase, but it's nowhere near a toy project. It also has a very capable CI pipeline that supports the development. The only part I couldn't get the agent to do well for nothing was frontend implementation, maybe because I refused to use a framework and defaulted to plain JavaScript and CSS embedded in jinja2 templated html files. I have picked up a couple of techniques when I did full stack work when I was an intern so I was cabaple of using the browser to inspect and refine the Dom elements. One thing that it did poorly for instance was to create all elements in block display, however planning a refactor to use flexbox throughout the code really improved the UI resilience and it was really effortless to deploy. In buzz I haven't touch most of the code, just some adjustments in the htmls to serve as an example for the agent of how to do it correctly, prompts not being the only way to interact with them, but I read most of the code and validated most of the functionality in merge requests, just like you'd do in a team work.
In a nutshell I think agents are really capable since November last year of working in large code bases, but I don't trust them to just be let loose. They need lots of hand holding and steering, but for me once I got the hang of it I really feel like I'm extremely productive.
My hypothesis is that people are more likely to have success with agents the more they enjoy writing in natural language and reading code, while people that prefer coding and dislike writing text will usually prefer handcrafting their programs.
I think you're right
Imagine if instead of f AI generated code, we all just started copying and pasting code from open source repos.
Imagine my velocity! I cloned the Linux kernel in seconds!
Instead we're basically doing exactly that, except through an AI remixer.
It leaves a very sour taste in my mouth