Hacker News

Have nee dealing with this in an area that requires insane attention: payments. It's strange feeling when you architect a system, all the invariants, all the fundamentals, all the guardrails, then implement the scaffolding in self documenting code, so the LLM has no way to build other than correctly, but you then see what it tries to do and it's WTF.

It all seems to behave correctly and then you run your test suite, and your e2e tests start failing in weirs ways, a few but not many accounting discrepancies, and everything else passes. You spend a lot of time asking it to explain what's happening, you give it the data to browse, and it keeps giving you very plausible explanations of "found the issue, the data shows this clearly, there fore the bug is here, all I need to do is fix this thing", and it does this, and it still fails.

When you open the hood, man, the code salad, the 100s of unnecessary, and complex and duplicate abstractions, the stacked mistakes and lazy corrective attempts, the comment pollution that overrides your instructions across sessions.

You realize that there are things and concepts that it just cannot wrap it's "mind" around and you need to grab the wheel for a bit, make the corrections, remove all the comment litter, commit and then hand the wheel back and tell it to "look at the last commit so see what I mean. explain to me what you did wrong and update all documentation, memory and context with this new understanding".

So if you have no experience in the field, you won't even know how to test, how to find that there is an issue, the appearance of "working" and the AI's confidence will trip you in prod so hard.

In my experience Claude tends to immensely over complicate things and go for a complex abstraction scheme even when all it needs to do is two lines of code. Combined with its eagerness to just code and more importantly pay more attention to the last prompt causes it to do an insanely complex solution first and then patch things with half assed attempts. The whole ordeal results in a code that on an initial glance looks okay, but quickly breaks down and becomes unmanageable. A significant effort is needed to push back Claude’s tendencies, so I mainly find myself pushing back or looking for ways to write an initial prompt with enough guidance, but only Fable was following them properly, Opus simply acts like a rhino in a china shop.