I presume that this works by processing the html and feeding to the llm. What approaches did you take for doing this? Or am I wrong?
I presume that this works by processing the html and feeding to the llm. What approaches did you take for doing this? Or am I wrong?
Under the "tools" part of the README it shows the following observation tools: - browser_snapshot_dom - browser_query - browser_accessible_tree - browser_read_text - browser_screenshot
So most likely the LLM can chose how to "see" the page?