Heh BrowserBase and Browser-Use exist specifically because this is a harder problem than it looks. Any approach will work for the first couple actions, that hard parts are long strings of actions that depend on the results of previous actions, compressing the context and knowing what to send, and having your tools work across all the edge cases (e.g. date picker fields, file upload fields, cross origin iframes, etc.).