No, like presenting the agent with an outline of the markup, a much abbreviated version, I guess it works much better with xml since property names are tags themselves, but xpath is an alternative to doing document.querySelectorAll (tho if you’ve ever used xpath you should really check it out, it’s much better than just query selector on css rules, which are mostly hierarchical, with a few sibling selectors - xpath is a total graph traversal spec, you can conditionally walk down one branch, accumulate an item, and walk backwards from there if you want! Really underutilized imo just because it’s 90s tech and people think we weren’t dealing with knowledge graphs back then, trying to invent new ways to retrieve sub documents instead of reading xml standard)
Back to the point, it makes more sense to me to tell the LLM the schema of the data and what query language it can use to access it, and let it decide how to retrieve data, instead of doing a RAG or bulk context stuffing
The XPath idea sounds great in theory, but it falls apart in a second on the modern web. Most sites (React/Vue/Tailwind) generate classes like div.flex-col.xg-9a, and the DOM structure completely changes on every single deploy. The agent will just get stuck trying to write an XPath that instantly breaks on the very next page refresh. Feeding it the visual state like the author does is way more reliable