It can fill forms - the agent can invoke a large number of tools to both observe and interact with a page

How does it do so? Just DOM manipulation, viewport scanning or something of the sort?