RPA has been a huge pain to work with.
It's almost always a framework around existing tools like Selenium that you constantly have to fight against to get good results from. I was always left with the feeling that I could build something better myself just handrolling the scripts rather than using their frameworks.
Getting Claude integrated into the space is going to be a game changer.
Most RPA work is in dealing with errors and exceptions, not the "happy path". I don't see how Claude's Screen Agent is going to work out there - what do you do when an error pops up and you need to implement specific business logic how to respond? How about consistency over many executions, and enterprise accounts. You want a centralized way to control agent behavior. Scripting based RPA is also much faster and cheaper to run, and more consistent.
Maybe Anthropic should focus on building a flexible RPA primitive we could use to make RPA workflows with, like for example extracting values from components that need scrolling, selecting values from long drop-down menus, or handling error messages under form fields.
I agree with your post.
Isn't this most programming? I always chuckle when a junior hire looks at my code and says: "It is mostly error checking."100% this. I am using the open source Ui.vision to automate some business tasks. Works well, but only 10% of the work is for automating the main workflow, 90% of the work goes into error and edge case handling (e. g. Internet down, website (to scrape data from) down, some input data has typos or the wrong date format, etc).
A human can work around all these error cases once she encounters them. Current RPA tools like Uipath or ui.vision need explicit programming for every potential situation. And I see no indication that Claude is doing any better than this.
For starters, for visual automation to work reliably the OCR quality needs to improve further and be 100% reliable. Even in that very basic "AI" area, Claude, ChatGPT, Gemini are good, but not good enough yet.
I can see it now, Claude generating expect scripts. 1994 and 2024 will be fully joined.
The big thing I expect at the next level is in using Claude to first generate UI-based automation based on an end user's instructions, then automatically defining a suite of end-to-end tests, confirming with the user "is this how it should work?", and then finally using this suite to reimplement the flow from first principles.
I know we're still a bit far from there, but I don't see a particular hurdle that strikes me as requiring novel research.
But does it do any better at soliciting the surprise requirements from the user, who after confirming that everything works, two months later reports a production bug because the software isn't correctly performing the different reqirements on the first Tuesday of each quarter that you never knew about.
I once had an executive ask to start an incident because he was showing a client the app and a feature that he wanted that had never been spec’d didn’t exist.
So basically, Tog's Paradox in action?