> tell it that it's wrong and it'll go, "You're absolutely right. Let me actually fix it"

...and then it still doesn't actually fix it

So, I recently have done my first couple heavily AI augmented tasks for hobby projects.

I wrote a TON of LVGL code. The result wasn’t perfect for placement, but when I iterated a couple of times, it fixed almost all of the issues. The result is a little hacked together but a bit better than my typical first pass writing UI code. I think this saved me a factor of 10 in time. Next I am going to see how much of the cleanup and factoring of the pile of code it can do.

Next I had it write a bunch of low level code to init hardware. It saved me a little time compared to reading the reference manual, and was more pleasant, but it wasn’t perfectly correct. If I did not have domain expertise I would not have been able to complete the task with the LLM.

When you argued that it saved you time by a factor of 10, have you even measured that properly? I initially also had the feeling that LLMs save me time, but in the end it didn't. I roughly compared my performance to past performance by the amount of stories done and LLMs made me slower even if I thought I am saving time...

From several month of deep work with LLMs I think they are amazing pattern matchers, but not problem solvers. They suggest a solution pattern based on their trained weights. This even could result in real solutions, e.g., when programming Tetris or so, but not when working on somewhat unique problems...

I am pretty confident. Last similar LVGL thing I did took me 10-12 hours, and I had a quicker iteration time (running locally instead of the test hardware). Here I spent a little more than an hour, testing on real hardware, and the last 20 minutes was nitpicking.

Writing front-end display code and instantiating components to look right is very much playing to the model’s strength, though. A carefully written sentence plus context would become 40 lines of detail-dense but formulaic code.

(I have also had a lot of luck asking it to make a first pass at typesetting things in Tex, too, for similar reasons)

There was a recent study that found that LLM users in general tend to feel like they were more productive with AI while actually being less productive.

presumably the study this very HN discussion responds to.

Heh, yep. Guess I sometimes forget to read the content before commenting too.

> If I did not have domain expertise I would not have been able to complete the task with the LLM.

This kind of sums up my experience with LLMs too. They save me a lot of time reading documentation, but I need to review a lot of what they write, or it will just become too brittle and verbose.

I was trying out Copilot recently for something trivial. It made the change as requested, but also added a comment that stated something obvious.

I asked it to remove the comment, which it enthusiastically agreed to, and then... didn't. I couldn't tell if it was the LLM being dense or just a bug in Copilot's implementation.

Some prompts can help:

"Find the root cause of this problem and explain it"

"Explain why the previous fix didn't work."

Often, it's best to undo the action and provide more context/tips.

Often, switching to Gemini 2.5 Pro when Claude is stumped helps a lot.

My favourite recent experience was switching multiple times between using a library function and rolling its own implementation, each time claiming that it's "simplifying" the code and making it "more reliable".

Sometimes it does... sometimes.

I recently had a nice conversation looking for some reading suggestions from an LLM. The first round of suggestions were superb, some of them I'd already read, some were entirely new and turned out great. Maybe a dozen or so great suggestions. Then it was like squeezing blood from a stone but I did get a few more. After that it was like talking to a babbling idiot. Repeating the same suggestions over and over, failing to listen to instructions, and generally just being useless.

LLMs are great on the first pass but the further you get away from that they degrade into uselessness.

Yeah, when I first heard about "one-shot"ing it felt more like a trick instead of a useful heuristic but with time my experience mimics yours, nowadays I try to one-shot small-ish changes instead of going back and forth.

I've had some luck in these cases prompting "your context seems to be getting too bloated. summarize this conversation into a prompt that I can feed into a new chat with a fresh context. make sure to include <...>".

Sometimes it works well the first time, and sometimes it spits out a summary where you can see what it is confused about, and you can guide it to create a better summary. Sometimes just having that summary in its context gets it over the hump and you can just say "actually I'm going to continue with you; please reference this summary going forward", and sometimes you actually do have to restart the LLM with the new context. And of course sometimes there's nothing that works at all.

I’ve had really good luck with having gpt generate a todo list that’s very, very detailed. Then having Claude use it to check items off. Still far from perfect but since doing that haven’t run into context issues since I can just start a new chat and feed it the todo (the todo also contains project info).