Can it perform DOM manipilation as well, like fill forms or would the LLM response need to be structured for each specific site to use it on? And would an LLM be able to perform such a task?
Can it perform DOM manipilation as well, like fill forms or would the LLM response need to be structured for each specific site to use it on? And would an LLM be able to perform such a task?
It can fill forms - the agent can invoke a large number of tools to both observe and interact with a page
How does it do so? Just DOM manipulation, viewport scanning or something of the sort?