> There is an initial honeymoon where the LLMs blow your mind out.
What does this even mean?
In the first one and half years after ChatGPT released, when I used them there was a 100% rate, when they lied to me, I completely missed this honeymoon phase. The first time when it answered without problems was about 2 months ago. And that time was the first time when it answered one of them (ChatGPT) better than Google/Kagi/DDG could. Even yesterday, I tried to force Claude Opus to answer when is the next concert in Arena Wien, and it failed miserably. I tried other models too from Anthropic, and all failed. It successfully parsed the page of next events from the venue, then failed miserably. Sometimes it answered with events from the past, sometimes events in October. The closest was 21 August. When I asked what’s on 14 August, it said sorry, I’m right. When I asked about “events”, it simply ignored all of the movie nights. When I asked about them specifically, it was like I would have started a new conversation.
The only time when they made anything comparable to my code of quality was when they got a ton of examples of tests which looked almost the same. Even then, it made mistakes… when basically I had to change two lines, so copy pasting would have been faster.
There was an AI advocate here, who was so confident in his AI skill, that he showed something exact, which most of the people here try to avoid: recorded how he works with AIs. Here is the catch: he showed the same thing. There were already examples, he needed minimal modifications for the new code. And even then, copy pasting would have been quicker, and would have contained less mistakes… which he kept in the code, because it didn’t fail right away.