This is solved easily by one additional sanity check API call to a different AI. I’m not sure why people think these bugs are like, complete showstopper insurmountable things. It’s a quick fix.

If Anthropic couldn't achieve that with Project Vend [0], why do you seem to think that everyone else could?

> Claudius, believing itself to be a human, told customers it would start delivering products in person, wearing a blue blazer and a red tie. The employees told the AI it couldn’t do that, as it was an LLM with no body.

> Alarmed at this information, Claudius contacted the company’s actual physical security — many times — telling the poor guards that they would find him wearing a blue blazer and a red tie standing by the vending machine.

[0] https://techcrunch.com/2025/06/28/anthropics-claude-ai-becam...

Taco Bell knows and controls it's own menu and the valid options are already directly encoded in their POS system, including purchase limits. Why would you call out to a different non-deterministic model instead of validating against the complete and deterministic data you have? Taco Bell can afford 1-2 engineers to manage that

This would be better off if the LLM was used for the human interface but traditional logic was used for the ordering API and its sanity checks. I.e. let it be fine the LLM can bug out on occasion, but keep rigorous boundaries around the amount of risk that's associated with.

AIs are not resilient against deliberate attacks, even if you use multiple different models.

Maybe they shouldn't work w/ customers then, retail workers have to deal w/ hostile customers all the time.

The net result would surely be retail workers only get to deal with hostile (or just difficult, even) customers while the LLM deals with the easy ones. That's what has happened with every other technology introduced to retail - less "business as usual" and "overhead" work and more "oddball" handling. E.g. the electronic PoS and intercom system already have the same kind of effect.

It seems so, and yet here we are

There’s other videos out there (not just of Taco Bell’s implementation per se) of these systems bugging out

[flagged]