An LLM has never saved me time. It has always produced something that doesn't quite work, has the rough shape of what I want, but somehow always gets all the details wrong.
I can type up what I want much faster and be sure it's at least solving the right problem, even if it may have bugs.
There are also tools to generate boilerplate that work much much better than LLMs. And they're deterministic.
If you do not plan out the architecture soundly, no amount of prompting will fix it if it is bad. I know this because my "handmade" project made with backward compatibility and horrible architecture keeps being badly fixed by LLM while the ones that rely on preemptive planning of the features and architecture, end up working right.
LLM's keep messing up even on a plain Laravel codebase..
I think that's true, but something even more subtle is going on. The quality of the LLM output depends on how it was prompted in a way more profound than I think most people realize. If you prompt the LLM using jargon and lingo that indicate you are already well experienced with the domain space, the LLM will rollplay an experienced developer. If you prompt it like you're a clueless PHB who's never coded, the LLM will output shitty code to match the style of your prompt. This extends to architecture, if your prompts are written with a mature understanding of the architecture that should be used, the LLM will follow suit, but if not then the LLM will just slap together something that looks like it might work, but isn't well thought out.
This is magical thinking.
LLMs are physically incapable of generating something “well thought out”, because they are physically incapable of thinking.
> An LLM has never saved me time. It has always produced something that doesn't quite work, has the rough shape of what I want, but somehow always gets all the details wrong.
This reads like a skill issue on your end, in part at least in the prompting side.
It does take time to reach a point where you can prompt an LLM sufficiently well to get a correct answer in one shot, developing an intuitive understanding of what absolutely needs to be written out and what can be inferred by the model.
I’m curious about how you landed “git gud; prompt better” and not “maybe the domain I work in is a better fit for LLM code”. Or, to be a bit less generous, consider the possibility that the code you’re generating is boilerplate, marshaling, and/or API calls. A facade of perceived complexity over something that’s as complex as a filter-map or two.
Sharing my 2 cents.
In the past 2 months I've been using all the SOTA models to help me design a new DSL for narrative scripting (such as game story telling) and a c# runtime implementation o the script player engine.
The language spec and design is about 95% authored by me up to this point; I have the LLMs work on the 2nd layer: the implementation specs/guidelines and the 3rd layer: concrete c# implementation.
Since it's a new language, I consider it's somewhat new/novel tasks for LLMs (at least, not like boilerplate stuff like HTTP API or CRUD service). I'd say, these LLMs have been very helpful - you can tell they sometimes get confused and have trouble to comply to the foreign language spec and design - but they are mostly smart enough to carry out the objectives, and they get better and better after the project got on track and has plenty of files/resources to read and reference.
And I'd also say "prompt better" is a important factor, just much more nuanced/complicated. I started with 0 experience with LLM agents and have learned a lot about how to tame them, and developed a protocol to collaborate with agents, these all comes from countless trial and errors, but in the end get boiled down to "prompt better".
I wonder if my intuition here is correct; I would posit that “PL implementation” is a far more popular and well-explored field than it seems. How many toy/small/labor-of-love langs make it to Show HN? How many more simply don’t?
I’ve never personally caught the language implementation bug. I appreciate your perspective here.
I totally agree, and I was fully aware of how common people make language for fun when I replied.
But I feel like the rationale would still stands: Considering LLMs' natures, common boilerplate tasks are easy because they can kind of just "decompress" from training data. But for a new language design, unless the language is almost identical to some other captured by the model, "decompression" would just fail.
When web search first arrived, the same thing happened. That is, some people didn't like using the tool because it wasn't finding what they wanted. This is still true for a lot of folks today, actually.
It's less "git gud; prompt better", and more, "be able to explain (well) what you want as the output". If someone messages the IT guy and says "hey my computer is broken" - what sort of helpful information can the IT guy offer beyond "turn it on and off again"?
> I’m curious about how you landed “git gud; prompt better” and not “maybe the domain I work in is a better fit for LLM code”.
1. Personal experience. Lazy prompting vs careful prompting.
2. They're coincidentally good at things I'm good at, and shit at things I don't understand.
3. Following from 2, when used by somebody who does understand a problem space which I do not, they easily succeed. That dog vibe coding games succeeded in getting claude to write games because his master knew a thing or two about it. I on the other hand have no game Dev experience, even almost no hobby experience with games specifically, so I struggle to get any game code that even remotely works.
Irrespective of the domain you specifically listed in 3 (game dev is, believe it or not, one of the “more complex” domains), you have completely failed to miss the point.
> 2. They're coincidentally good at things I'm good at, and shit at things I don't understand.
This may well be! In the perfect world this would be balanced with the knowledge that maybe “the things you’re good at” are objectively* easier than “things you don’t understand”. Speaking for myself, I’m proficient in many more easy things than hard things.
*inasmuch as anything can be “objectively” easier
The parent is specifically talking about producing boilerplate code -a domain in which LLM excell at- and not having had any success at that. It's therefore not a leap of logic to assume they haven't put (enough) effort into getting better at prompting first, which is perfectly fine per se but leans towards a skill issue and not an immutable property of gen AI.
The uncomfortable fact remains that one cannot really expect to get much better results from an LLM without putting some work themselves. They aren't magical oracles.