When the AI gladly accepts orders that cost hundreds of thousands of dollars or 18,000 cups of water, it’s probably not production ready
https://youtube.com/shorts/FDZj6DCWlfc
https://www.tiktok.com/@90daygrinder/video/75355084374472983... (another example from a different chain)
I feel like we watched different videos.. Seemed like the AI (or other monitoring system) recognized a problem with the 18000 cups of water order and quickly transitioned to a real human. That instance looked pretty production ready to me.
I interpreted it as the AI system added something strange to the order, and when someone saw it, that’s when the system was cut off. Otherwise the next word sounded like a confirmation
That said, this is not the only video floating out there of these type of systems not handling edge cases elegantly
I suspect the human worker still had a headset to listen in to the orders at the drive-through and just intervened when she heard that order.
Regardless, looks like you can't replace everyone with A.I. just yet.
[flagged]
This is solved easily by one additional sanity check API call to a different AI. I’m not sure why people think these bugs are like, complete showstopper insurmountable things. It’s a quick fix.
If Anthropic couldn't achieve that with Project Vend [0], why do you seem to think that everyone else could?
> Claudius, believing itself to be a human, told customers it would start delivering products in person, wearing a blue blazer and a red tie. The employees told the AI it couldn’t do that, as it was an LLM with no body.
> Alarmed at this information, Claudius contacted the company’s actual physical security — many times — telling the poor guards that they would find him wearing a blue blazer and a red tie standing by the vending machine.
[0] https://techcrunch.com/2025/06/28/anthropics-claude-ai-becam...
Taco Bell knows and controls it's own menu and the valid options are already directly encoded in their POS system, including purchase limits. Why would you call out to a different non-deterministic model instead of validating against the complete and deterministic data you have? Taco Bell can afford 1-2 engineers to manage that
This would be better off if the LLM was used for the human interface but traditional logic was used for the ordering API and its sanity checks. I.e. let it be fine the LLM can bug out on occasion, but keep rigorous boundaries around the amount of risk that's associated with.
AIs are not resilient against deliberate attacks, even if you use multiple different models.
Maybe they shouldn't work w/ customers then, retail workers have to deal w/ hostile customers all the time.
The net result would surely be retail workers only get to deal with hostile (or just difficult, even) customers while the LLM deals with the easy ones. That's what has happened with every other technology introduced to retail - less "business as usual" and "overhead" work and more "oddball" handling. E.g. the electronic PoS and intercom system already have the same kind of effect.
It seems so, and yet here we are
There’s other videos out there (not just of Taco Bell’s implementation per se) of these systems bugging out
[flagged]