Op here, happy to answer any question!

How does it compare with https://agent-browser.dev/ ? It would be great if you could add it to your table: https://github.com/theredsix/agent-browser-protocol?#compari...

agent-browser's biggest selling point is a CLI wrapper around CDP/puppeteer for context management. It'll have mostly the same pros/cons as CDP on the table.

Updated the table!

Have you considered removing all headless traits so that agent wont be easily detected, just like what browserbase did here?

https://www.browserbase.com/blog/chromium-fork-for-ai-automa...

It runs in headful mode and all control signals are passed in as system events so it bypasses the problems browserbase identified.

Glad to know that, but being able to run the browser in headless mode will be much helpful in an agentic setting (think parallel agents operating browsers in the background), since you are already patching chromium, that might be a great addition to the feature list :)

Yes agreed, added to the roadmap!

Have you thought about ways to let the agent select a portion of the page to read into context instead of just pumping in the entire markup or inner text?

I had good luck letting Claude use an xml parser to get a tree of the file, and then write xpath selections to grab what it needed

hmm, like adding an optional css selector for targeting?

No, like presenting the agent with an outline of the markup, a much abbreviated version, I guess it works much better with xml since property names are tags themselves, but xpath is an alternative to doing document.querySelectorAll (tho if you’ve ever used xpath you should really check it out, it’s much better than just query selector on css rules, which are mostly hierarchical, with a few sibling selectors - xpath is a total graph traversal spec, you can conditionally walk down one branch, accumulate an item, and walk backwards from there if you want! Really underutilized imo just because it’s 90s tech and people think we weren’t dealing with knowledge graphs back then, trying to invent new ways to retrieve sub documents instead of reading xml standard)

Back to the point, it makes more sense to me to tell the LLM the schema of the data and what query language it can use to access it, and let it decide how to retrieve data, instead of doing a RAG or bulk context stuffing

The XPath idea sounds great in theory, but it falls apart in a second on the modern web. Most sites (React/Vue/Tailwind) generate classes like div.flex-col.xg-9a, and the DOM structure completely changes on every single deploy. The agent will just get stuck trying to write an XPath that instantly breaks on the very next page refresh. Feeding it the visual state like the author does is way more reliable