If 80% is “done by the AI”, who is responsible for the certain failure on behalf of the AI? Given inference often is, >0%, wrong — in a word… hmm.

How many 9s until you’re comfortable? Even then, knowing 1000 tasks could likely have at least 1 foundational issue… how do you audit? “Pretty please do the needful” and have another “please ensure they do the needful”. Do you review the 1000 inputs/outputs processed? Don’t get me wrong, am familiar with the “send it” ethos all too well, but at-scale it seems like quite the pickle.

Genuinely curious how most people consider these angles… was tasked with building a model once to perform what literally could’ve otherwise been a SQL query… when I brought this up, it was met with “well we need to do it with AI” I don’t think a humans gonna want to find that needle in a haystack when 100,000 significant documents are originated… but I don’t have to worry about that one anymore thank goodness.