It’s good at doing stuff like “host this all in Docker. Make a Postgres database with a Users table. Make a FastAPI CRUD endpoint for Users. Make a React site with a homepage, login page, and user dashboard”.
It’ll successfully produce _something_ like that, because there’s millions of examples of those technologies online. If you do anything remotely niche, you need to hold its hand far more.
The more complicated your requirements are, the closer you are to having “spicy autocomplete”. If you’re just making a crud react app, you can talk in high level natural language.
Did you try claude code and spend actual time going back and forth with it, reviewing it's code and providing suggestions; Instead of just expecting things to work first try with minimal requirements?
I see claude code as pair programming with a junior/mid dev that knows all fields of computer engineering. I still need to nudge it here and there, it will still make noob mistakes that I need to correct and I let it know how to properly do things when it gets them wrong. But coding sessions have been great and productive.
In the end, I use it when working with software that I barely know. Once I'm up and running, I rarely use it.
> Did you try claude code and spend actual time going back and forth with it, reviewing it's code and providing suggestions; Instead of just expecting things to work first try with minimal requirements?
I did, but I always approached LLM for coding this way and I have never been let down. You need to be as specific as possible, be a part of the whole process. I have no issues with it.
FWIW, I used Gemini to write an Objective-C app for Apple Rhapsody (!) that would enumerate drivers currently loaded by the operating systems (more or less save level of difficulty as the OP, I'd say?), using the PDF manual of NextStep's DriverKit as context.
It... sort of worked well? I had to have a few back-and-forth because it tried to use Objective-C features that did not exist back then (e.g. ARC), but all in all it was a success.
So yeah, niche things are harder, but on the other hand I didn't have to read 300 pages of stuff just to do this...
I remember writing obj-c naturally by hand. Before swift was even a twinkle in tim cooks eye. One of my favorite languages to program in I had a lot of fun writing ios apps back in the day it seems like
I member obj c, using it was a profound experience, it was so different from other languages I felt like an anthropologist.
Also, fun names like `makeFunctionNameInCommentLongAndDescriptiveWithNaturalLanguage:(NSLanguage *)language`
I agree, but I think there's an important distinction to be made.
In some cases, it just doesn't have the necessary information because the problem is too niche.
In other cases, it does have all the necessary information but fails to connect the dots, i.e. reasoning fails.
It is the latter issue that is affecting all LLMs to such a degree that I'm really becoming very sceptical of the current generation of LLMs for tasks that require reasoning.
They are still incredibly useful of course, but those reasoning claims are just false. There are no reasoning models.
In other words, the vibe coders of this world are just redundant noobs who don't really belong on the marketplace. They've written the same bullshit CRUD app every month for the past couple of years and now they've turned to AI to speed things up
Last week I asked Claude to improve a piece of code that downloads all AWS RDS certificates to just the ones needed for that AWS region. It figured out several ways to determine the correct region, made a nice tradeoff and suggested the most reliable way. It rewrote the logic to download the right set, did some research to figure out the right endpoint in between. It only made one mistake, it fallback mechanism was picking EU, which was not correct. Maybe 1 hour of work. On my own it would have taken me close to a working day to figure it all out.
This is just a thought experiment.
I don't mean to be treading on feet but I'm noticing this more and more in the debates around AI. Imagine if there are developers out there that could have done this task in 30 mins without AI.
The level of performanace of AI solutions is heavily related to the experience level of the developer and of the problem space being tackled - as this thread points out.
Unfortunately the marketing around AI ignores this and makes every developer not using AI for coding seem like a dinosauer, even though they might well be faster in solving their particular problems.
AI is moving problem solving skills from coding to writing the correct prompts and teaching AI to do the right thing - which, again, is subjective, since the "right thing" for one developer isn't the "right thing" for the another developer. "Right thing" being the correct solution, the understandable solution, the fastest solution, etc depending on the needs of the developer using the AI.
IMHO, the thirty minute developer would still save 10 minutes by vibe coding. That marketing's not wrong.
Spelling out exactly what you want and checking/fixing what you receive is still faster than typing out the code. Moreover, nobody's job involves nothing but brainiac coding, day after day. You have to clean up and lay foundations, whatever level you are at.
> IMHO, the thirty minute developer would still save 10 minutes by vibe coding. That marketing's not wrong.
For me, that's too general. Of course, perhaps for this particular, specific problem it might be true. But as this thread points out, anything niche and AI fails to help productively. Of course then comes the marketing: just wait, AI will be able to cover those niche cases also.
> want and checking/fixing what you receive is still faster than typing out the code
Then I do wonder why there are developers at all. After all that's what AI is so good at - if one believes the marketing - being precise and describing exactly what needs to be done. Surely it must be faster having two AIs talking to each and hammering out the code.
And even typing is subjective: ten fingers versus two, versus four .. etc. There are developers that can type faster than they can think - in certain cases.
There is also the developer in flow versus the stop and go using an AI prompts to get it just right. I dunno, if it comes true, then thankfully there won't be any humans to create bugs in code but somehow, I can't see it happening.
There are two ways to do this. One is to one-shot or maybe few-shot a solution. Maybe this works. Maybe it doesn't. Sometimes it works if you copy a solution from [Product 1] to [Product 2] and say "Fix this."
The other is to look at the non-working solution you get, read through it, and think "Oh, I didn't know about that framework/system/product/library, that's neat" and then do some combination of further research and more hand-holding to get to something that does work.
This is useful, more or less, no matter what your level.
It's also good for explaining core industry tooling you've maybe never used before. If you're new to Postgres/NoSQL/AWS/Docker/SwiftUI/whatever it can talk you through it and give you an instant bootcamp with entry-level examples and decent solutions.
And for providing fixes for widely known bugs and issues in products that may not be widely known to you (yet.)
IME ChatGPT5 is pretty solid with most science/tech up to undergrad. It gets hallucinatory past that, and it's still flattering, which is annoying, but you can tell it to cut that out.
Generally you can use it as a dumb offshore developer, or as an infinitely patient private tutor.
That latter option is very useful. The first, not always.
> The level of performanace of AI solutions is heavily related to the experience level of the developer and of the problem space being tackled - as this thread points out. > > Unfortunately the marketing around AI ignores this and makes every developer not using AI for coding seem like a dinosauer, even though they might well be faster in solving their particular problems.
You're not necessarily wrong, but I think it's worth noting that very few developers are only ever coding deep in their one domain that they're good at. There's just too many things to be deeply good at everything. For example, it's common that infra and CI tasks are stuff that most developers haven't learned by heart, because you don't tend to touch them very often.
Claude shines here — I've made a lot more useful GitHub Actions jobs recently, because while I could automate something, if I know I'm going to have to look up API docs (especially multiple APIs I'm not super familiar with) then I tend to figure that the automation will lose out the trade-off between doing the task (see https://xkcd.com/1205/). Claude being able to hash out those rapidly, and in a way that's easily verifiable that it's doing the right thing, has changed that arithmetic for me substantially.
> Maybe 1 hour of work. On my own it would have taken me close to a working day to figure it all out.
1. Find out how to access metadata about the node running my code (assumption: some kind of an environment variable) [1-10 minutes depending on familiarity with AWS]
2. Google "RDS certificates" and find the bundle URL after skimming the page [1] for important info [1-5 minutes]
3. Write code to download the certificate bundle, fallback being "global-bundle.pem" if step 1 failed for some reason? [5-20 minutes depending on all the bells and whistles you need]
Did I miss anything or completely misunderstand the task?
[1] https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Using...
edit: I asked Claude Sonnet 4 to write robust code for a Node.JS application that downloads RDS CA bundle for the AWS region that the code is currently running in and saves it at the supplied filesystem path.
0. It generated about 250 lines of code
1. Fallback was us-east (not global)
2. The download URLs for each region were hardcoded as KV pairs instead of being constructed dynamically
3. Half of the regions were missing
4. It wrote a function that verifies whether the certificate bundle looks valid (i.e. includes a PEM header)... but only calls it on the next application startup, instead of doing so before saving a potentially invalid certificate bundle to disk and proceeding with the application startup.
5. When I complained that half of my instances are downloading global bundles instead of regional ones (because they're not present in the hardcoded list), it:
- incorrectly concluded that not all regions have CA bundles available and hardcoded a duplicate list in 2 places containing regions that are known to offer CA bundles (which is all of them). These lists were even shorter than the last ones.
- wrote a completely unnecessary function that checks whether a regional CA bundle exists with a HEAD request before actually downloading it with a GET request, adding another 50 lines of code
Now I'm having to scrutinize 300 lines of code to make sure it's nothing doing something even more unexpected.
I think the majority of coders out there write the same CRUD app over and over again in different flavors. That's what the majority of businesses seem to pay for.
If a business needs the equivalent of a Toyota Corolla, why be upset about the factory workers making the millionth Toyota Corolla?
> I think the majority of coders out there write the same CRUD app over and over again in different flavors
In my experience, that's not entirely true. Sure, a lot of app are CRUD apps, but they are not the same. The spice lies in the business logic, not in programming the CRUD operations. And then of course, scaling, performance, security, organization, etc etc.
Good thing LLMs are really good at unique business logic, scaling, performance, security, organization, etc etc.!
(edit: /s to indicate sarcasm)
Yeah, my experience with LÖVR [0] and LLM (ChatGPT) has been quite horrible. Since it's very niche and quite recently quite a big API change has happened, which I guess the model wasn't trained on. So it's kind of useless for that purpose.
---
[0]: https://lovr.org