> they fail convincingly

To someone that has paid zero attention and/or deliberately ignores any coverage about the numerous (and often hilarious) ways that spicy autocomplete just completely shits the bed? Sure, maybe.

That’s fair — if you’re already skeptical and paying attention, the failures are obvious and often funny. The risk tends to show up more with non-experts or downstream systems that assume the output is trustworthy because it looks structured and confident.

Autocomplete failing loudly is annoying; autocomplete failing quietly inside automation is where things get interesting.

> The risk tends to show up more with non-experts

This hits a key point that isn't emphasised enough. A few interactions with technology and people have shaped my view:

I fiddled with Apple's Image Playground thing sometime last year, and it was quite rewarding to see a result from a simple description. It wasn't exactly what I'd asked for, but it was close, kind of. As someone who has basically zero artistic ability it was fun to be able to create something like that. I didn't think much about it at the time, but recently I thought about this again, and I keep that in mind when seeing people who are waxing poetic about using spicy autocomplete to "write code" or "analyse business requests" or whatever the fuck it is they're using it for. Of course it seems fucking magical and fool proof if you don't know how to do the thing you're asking it for, yourself.

I had to fly back to my childhood home at very short notice in August to see my (as it turned out, dying) father in hospital. I spoke to more doctors, nurses and specialists in two weeks, than I think I have ever spoken to about my own health in 40+ years). I was relaying the information from doctors to my brother via text message. His initial response to said information was to send me back a Chat fucking GPT summary/analysis of what I'd passed along to him... because apparently my own eyeballs seeing the physical condition of our father, and a doctor explaining the cause, prognosis and chances of recovery were not reliable enough. Better ask Dr Spicy Autocomplete for a fucking second opinion I guess.

So now my default view about people who choose to use spicy autocomplete for anything besides shits and giggles like "Write a Star Trek fan fiction where Captain Jack Sparrow is in charge of DS9", or "Draw <my wife> as a cute little bunny" is essentially "yeah of course it looks like infallible magic, if you don't know how to do it yourself".

The real risk with LLMs isn’t when they fail loudly — it’s when they fail quietly and confidently, especially for non-experts or downstream systems that assume structured output equals correctness.

When you don’t already understand the domain, AI feels infallible. That’s exactly when unvalidated outputs become dangerous inside automation, decision pipelines, and production workflows.

This is why governance can’t be an afterthought. AI systems need deterministic validation against intent and execution boundaries before outputs are trusted or acted on — not just better prompts or post-hoc monitoring.

That gap between “sounds right” and “is allowed to run” is where tools like Verdic Guard are meant to sit.