You trust your natural language instructions thousand times a day. If you ask for a large black coffee, you can trust that is more or less what you’ll get. Occasionally you may get something so atrocious that you don’t dare to drink, but generally speaking you trust the coffee shop knows what you want. It you insist on a specific amount of coffee brewed at a specific temperature, however, you need tools to measure.
AI tools are similar. You can trust them because they are good enough, and you need a way (testing) to make sure what is produced meet your specific requirements. Of course they may fail for you, doesn’t mean they aren’t useful in other cases.
All of that is simply common sense.
More analogy.
What’s to stop the barista putting sulphuric acid in your coffee? Well, mainly they don’t because they need a job and don’t want to go to prison. AIs don’t go to prison, so you’re hoping they won’t do it because you’ve promoted them well enough.
* prompted
> All of that is simply common sense.
Is that why we have legal codes spanning millions of pages?
The person I'm replying to believes that there will be a point when you no longer need to test (or review) the output of LLMs, similar to how you don't think about the generated asm/bytecode/etc of a compiler.
That's what I disagree with - everything you said is obviously true, but I don't see how it's related to the discussion.
I don't necessarily think we'll ever reach that point and I'm pretty sure we'll never reach that point for some higher risk applications due to natural language being ambiguous.
There are however some applications where ambiguity is fine. For example, I might have a recipe website where I tell a LLM to "add a slider for the user to scale the number of servings". There's a ton of ambiguity there but if you don't care about the exact details then I can see a future where LLMs do something reasonable 99.9999% of the time and no one does more than glance at it and say it looks fine.
How long it is until we reach that point and if we'll ever reach that point is of course still up for debate, but I dnt think it's completely unrealistic.
That's true, and I more or less already use it that way for things like one off scripts, mock APIs, etc.
I don't think the argument is that AI isn't useful. I think the argument is that it is qualitatively different from a compiler.