> Learning how to use LLMs in a coding workflow is trivial. There is no learning curve. You can safely ignore them if they don’t fit your workflows at the moment.
That's a wild statement. I'm now extremely productive with LLMs in my core codebases, but it took a lot of practice to get it right and repeatable. There's a lot of little contextual details you need to learn how to control so the LLM makes the right choices.
Whenever I start working in a new code base, it takes a a non-trivial amount of time to ramp back up to full LLM productivity.
Is the non-trivial amount of time significantly less than you trying to ramp up yourself?
I am still hesitant using AI for solving problems for me. Either it hallucinates and misleads me. Or it does a great job and I worry that my ability of reasoning through complex problems with rigor will degenerate. When my ability of solving complex problems degenerated, patience diminished, attention span destroyed, I will become so reliant on a service that other entities own to perform in my daily life. Genuine question - are people comfortable with this?
The ramp-up time with AI is absolutely lower than trying to ramp up without AI.
My comment is specifically in contrast to working in a codebase where I'm at "max AI productivity". In a new codebase, it just takes a bit of time to work out kinks and figure out tendencies of the LLMs in those codebases. It's not that I'm slower than I'd be without AI, I'm just not at my "usual" AI-driven productivity levels.
Don't use it as a solution machine.
You use it when you know how to do something and know exactly what the solution looks like, but can't be arsed to do it. Like most UI work where you just want something in there with the basic framework to update content etc. There's nothing challenging in doing it, you know what has to be done, but figuring out the weird-ass React footguns takes time. Most LLMs can one-shot it with enough information.
You can also use it as a rubber duck, ask it to analyse some code, read and see if you agree. Ask for improvements or modifications, read and see if you agree.
>Genuine question - are people comfortable with this?
It's a question of degree, but in general, yeah. I'm totally comfortable being reliant on other entities to solve complex problems for me.
That's how economies work [1]. I neither have nor want to acquire the lifetime of experience I would need to learn how to produce the tea leaves in my tea, or the clean potable water in it, or the mug they are contained within, or the concrete walls 50 meters up from ground level I am surrounded by, or so on and so forth. I can live a better life by outsourcing the need for this specialized knowledge to other people, and trade with them in exchange for my own increasingly-specialized knowledge. Even if I had 100 lifetimes to spend, and not the 1 I actually have, I would probably want to put most of them to things that, you know, aren't already solved-enough problems.
Everyone doing anything interesting works like this, with vanishingly few exceptions. My dad doesn't need to know how to do algebra to get his taxes done, he just has an accountant. And his accountant doesn't need to know how to rewire his turn of the century New England home. And if you look at the exceptions, like that really cute 'self sufficient' family who uploads weekly YouTube videos called "Our Homestead Life"... It often turns out that the revenue from that YouTube stream is nontrivial to keeping the whole operation running. In other words, even if they genuinely no longer go to Costco, it's kind of a gyp.
[1]: https://www.youtube.com/watch?v=67tHtpac5ws
> My dad doesn't need to know how to do algebra to get his taxes done, he just has an accountant.
This is not quite the same thing. The AI is not perfect, it frequently makes mistakes or suboptimal code. As a software engineer, you are responsible for finding and fixing those. This means you have to review and fully understand everything that the AI has written.
Quite a different situation than your dad and his accountant.
I see your point. I don't think it's different in kind, just degree. My thought process: First, is my dad's accountant infallible?
If not, then they must themselves make mistakes or do things suboptimally sometimes. Whose responsibility is that - my dad, or my dad's accountant?
If it is my dad, does that then mean my dad has an obligation to review and fully understand everything the accountant has written?
And do we have to generalize that responsibility to everything and everyone my dad has to hand off work to in order to get something done? Clearly not, that's absurd. So where do we draw the line? You draw it in the same place I do for right now, but I don't see why we expect that line to be static.
> This means you have to review and fully understand everything that the AI has written.
Yes, and people who care and is knowledgeable do this already. I do this, for one.
But there’s no way one is giving as thorough a review as if one had written code to solve the problem themselves. Writing is understanding. You’re trading thoroughness and integrity for chance.
Writing code should never have been a bottle neck. And since it wasn’t, any massive gains are due to being ok with trusting the AI.
I would honestly say, it's more like autocomplete on steroids, like you know what you want so you just don't wanna type it out (e.g. scripts and such)
And so if you don't use it then someone else will... But as for the models, we already have some pretty good open source ones like Qwen and it'll only get better from here so I'm not sure why the last part would be a dealbreaker
He’s not wrong.
Getting 80% of the benefit of LLMs is trivial. You can ask it for some functions or to write a suite of unit tests and you’re done.
The last 20%, while possible to attain, is ultimately not worth it for the amount of time you spend in context hells. You can just do it yourself faster.
> The last 20%, while possible to attain, is ultimately not worth it for the amount of time you spend in context hells. You can just do it yourself faster.
I'm arguing that there's a skill that has to be learned in order to break through this. As you start in a new code base, you should be quick to jump in when you hit that 20%. But, as you spend more time in it, you learn how to avoid the same "context hell" issues and move that number down to 15%, 10%, 5% of the time.
You're still going to need to jump in, but when you can learn to get the LLM to write 95% of the code for you, that's incredibly powerful.
It’s not incredibly powerful, it’s incrementally powerful. Getting the first 80% via LLM is already the incredible power. A sufficiently skilled developer should be able to handle the rest with ease. It is not worth doing anything unnatural in an effort to chase down the last 20%, you are just wasting time and atrophying skills. If you can get full 95% in some one shot prompts, great. But don’t go chasing waterfallls.
No, it actually has an exponential growth type of effect on productivity to be able to push it to the boundary more.
I’m making this a bit contrived, but I’m simplifying it to demonstrate the underlying point.
When an LLM is 80% effect, I’m limits to doing 5 things in parallel since I still need to jump in 20% of the time.
When an LLM is 90% effect, I can do 10 things at once. When it’s 95%, 20 things. 99%, 100 things.
Now, obviously I can’t actually juggle 10 or 20 things at once. However, the point is there are actually massive productivity gains to be had when you can reduce your involvement in a task from 20% to, even 10%. You’re effectively 2x as productive.
I’d bet you don’t even have 2 or 3 things to do at once, much less 100. So it’s pointless to chase those types of coverages.
Do you understand what parallel means? Most LLM responds in seconds, there is no parallel work for you to do there.
Or do you mean you are using long running agents to do tasks and then review those? I haven't seen such a workflow be productive so far.
I run through a really extensive planning step that generates technical architecture and iterative tasks. I then send an LLM along to implement each step, debugging, iterative, and verifying it's work. It's not uncommon for it to take a non-trivial amount of time to complete a step (5+ minutes).
Right now, I still need to intervene enough that I'm not actually doing a second coding project in parallel. I tend to focus on communication, documentation, and other artifacts that support the code I'm writing.
However, I am very close to hitting that point and occasionally do on easier tasks. There's a _very_ real tipping point in productivity when you have confidence that an LLM can accomplish a certain task without your intervention. You can start to do things legitimately in parallel when you're only really reviewing outputs and doing minor tweaks.
> 'm arguing that there's a skill that has to be learned in order to break through this. As you start in a new code base, you should be quick to jump in when you hit that 20%. But, as you spend more time in it, you learn how to avoid the same "context hell" issues and move that number down to 15%, 10%, 5% of the time.
The problem is that you're learning a skill that will need refinement each time you switch to a new model. You will redo some of this learning on each new model you use.
This actually might not be a problem anyway, as all the models seem to be converging asymptotically towards "programming".
The better they do on the programming benchmarks, the further away from AGI they get.
exactly. people delude themselves thinking this is productivity. Tweaking prompts is to get it "right" is very wasteful.
> That's a wild statement. I'm now extremely productive with LLMs in my core codebases, but it took a lot of practice to get it right and repeatable. There's a lot of little contextual details you need to learn how to control so the LLM makes the right choices.
> Whenever I start working in a new code base, it takes a a non-trivial amount of time to ramp back up to full LLM productivity.
Do you find that these details translate between models? Sounds like it doesn't translate across codebases for you?
I have mostly moved away from this sort of fine-tuning approach because of experience a while ago around OpenAI's ChatGPT 3.5 and 4. Extra work on my end necessary with the older model wasn't with the new one, and sometimes counterintuitively caused worse performance by pointing it at what the way I'd do it vs the way it might have the best luck with. ESPECIALLY for the sycophantic models which will heavily index on "if you suggested that this thing might be related, I'll figure out some way to make sure it is!"
So more recently I generally stick to the "we'll handle a lot of the prompt nitty gritty" for you IDE or CLI agent stuff, but I find they still fall apart with large complex codebases and also that the tricks don't translate across codebases.
Yes and no. The broader business context translates well, but each model has it's own blindspots and hyperfocuses that you need to massage out.
* Business context - these are things like code quality/robustness, expected spec coverage, expected performance needs, domain specific knowledge. These generally translate well between models, but can vary between code bases. For example, a core monolith is going to have higher standards than a one-off auxiliary service.
* Model focuses - Different models have different tendencies when searching a code base and building up their context. These are specific to each code base, but relatively obvious when they happen. For example, in one code base I work in, one model always seems to pick up our legacy notification system while another model happens to find our new one. It's not really a skill issue. It's just luck of the draw how files are named and how each of them search. They each just find a "valid" notification pattern in a different order.
LLMs are massively helpful for orienting to a new codebase, but it just takes some time to work out those little kinks.
This is like UB in compilers but 100x worse, because there's no spec, it's not even documented, and it could change without a compiler update.
It is nothing at all like UB in a compiler. UB creates invisible bugs that tend to be discovered only after things have shipped. This is code generation. You can just read the code to see what it does, which is what most professionals using LLMs do.
With the volume of code people are generating, no you really can't just read it all. pg recently posted [1] that someone he knows is generating 10kloc/day now. There's no way people are using AI to generate that volume of code and reading it. How many invisible bugs are lurking in that code base, waiting to be found some time in the future after the code has shipped?
[1] https://x.com/paulg/status/1953289830982664236
I read every line I generate and usually adjust things; I'm uncomfortable merging a PR I haven't put my fingerprints on somehow. From the conversations I have with other practitioners, I think this is pretty normal. So, no, I reject your premise.
My premise didn't have anything to do with you, so what you do isn't a basis for rejecting it. No matter what you or your small group of peers do, AI is generating code at a volume that all the developers in the world combined couldn't read if they dedicated 24hrs/day.
[dead]