“I’d like 2 cheeseburgers, and 4 fries. No mayo or mustard. Actually make one of them a double, and one with bacon. Oh how much if I make the first a combo?”

You think this conversation could be handled with the tech of 4 years ago? Siri can’t even turn off the lights and tell me a joke in the same request. Humans do not deliver all information in order (eg. The all the instructions refer to the burgers not the fries, but you only know that because you understand the essential nature of fries and what they typically include). That’s what AI in the drive thru is for.

I'm not sure current tech could reliably take that order, honestly. There's essentially 0 chance it would try to disambiguate the meaning of "one of them", and from there it's a tossup whether you'll get a double cheeseburger, a double box of fries, or double mayo.

Current tech is pretty dang close. I gave the order to ChatGPT and it parsed it almost perfectly [0], even handling the ambiguity about what happens if you add a combo to an order that already includes several fries à la carte. The only thing it missed is that I didn't actually order the combo (but merely want to know how much the upgrade is), but I'm sure some fine-tuning could solve that. (Come to think of it, a fast food restaurant would consider this implicit upsell as a feature.)

The main challenge AI would face is people who come by at 3 AM drunk and stoned, indecisively slurring through their order, but I imagine there'd be a system to redirect these edge cases to an actual human.

[0] https://chatgpt.com/share/68ba2233-9f48-8011-905a-c69cc5e91b...

Pretty dang close isn't the same as accurate for an exchange of time and money. Voice->text, with a noisy background, is a particularly hard problem. Especially with hardware not designed to limit background noise. Try it. Whisper is still the leading speech->text model in our tests, but add noise reduction, echo, diarization, etc. It's a hard problem.

[deleted]

>Come to think of it, a fast food restaurant would consider this implicit upsell as a feature.

Yeah, just what every restaurant manager wants: to deal with customers who paid more for things they didn't order.

It can't. Not reliably. I think every major chain that was trying it has ripped it out.

It'll definitely be a thing within 5 years, max, but it's not mature enough for production yet

I agree with sibling replies but more tangentially maybe: why is it that sometimes the point of these things is that I do not have to modify my behavior at all, while the restaurant can pay one less person, but other times the point is all about modifying it so the company can pay one less person?

Like here: if the restaurant really wants to get rid of their intercom person, why not make it self checkout, no AI required? What is actually saved or gained either way? There is nothing intrinsic about this situation that requires me to use natural language to order something. People order tons of food online these days anyway!

Like I just dont think it makes sense and I also probably don't think the economics of this would work out with fast food restaurant scale.

Again, just step back and think about it for a moment: lots of this really doesn't make sense. The world is not really full of tasks a good prompt can solve. There a million things that aren't "produce this python script" or "summarize this article probably correctly."

Why can't it just be what it is? Why does it absolutely have to be everything or nothing? So much of the thought around this feels so clearly wrong headed, its just starting to feel truly absurd.

To be fair, this would trip up a lot of _humans_ as well.