It's funny that what the author identifies as "the reality check":

  Here’s the reality check: One panelist mentioned that 95%
  of AI agent deployments fail in production. Not because the 
  models aren’t smart enough, but because the scaffolding 
  around them, context engineering, security, memory design, 
  isn’t there yet.
Could be a reasonable definition of "understanding the problem to solve."

In other words, everything identified as what "the scaffolding" needs is what qualified people provide when delivering solutions to problems people want solved.

They fail because the “scaffolding” is building the complicated expert system that AI promised that one would not have to do.

If I implement myself a strict parser and an output post-processor to guard against hallucinations, I have done 100% of the business related logic. I can skip the LLM in the middle altogether.

> If I implement myself a strict parser and an output post-processor to guard against hallucinations, I have done 100% of the business related logic. I can skip the LLM in the middles altogether.

Well said and I could not agree more.

> If I implement myself a strict parser and an output post-processor to guard against hallucinations, I have done 100% of the business related logic. I can skip the LLM in the middle altogether.

You might even be able to put a UI on it that is a lot more effective than asking the user to type text into a box.

Very interesting that you've found ways to mitigate the hallucination issue. Are you able to share more about what worked for you with the post processor and parser?

You see, in order to get the AI agent to do it's job, we needed to write a lot of software to provide it with guard rails so that it doesn't lose its mind when doing so.

might as well just write the ai agent part of the software yourself as well.

At work we're deploying a chat bot to help users with our internal tools and it's just a forcing function to write and mark as deprecated the documentation we never maintained in the first place.

So...

The bot, to its credit, returns some decent results. But my guess is that it will be quite a while before we see it in prod since a lot of these projects go from 0 - 80% in a week and 80% - deployable in several years.

It is really just BS. These are just basic DSA stuff. We deployed a real world solution by doing of all of that on our side. It's not magic. It's engineering.