This is a great flag and something we want to spend more time experimenting with as we continue to build out the repo.
Right now we kind of have a mixture of the 2 approaches, but there's a large room for improvement.
- When libretto performs the code generation it initially inspects the page and sends the network calls/playwright actions using `snapshot` and `exec` tools to test them individually. After it's tested all of individual selectors and thinks it's finished, it creates a script and then runs the script from scratch. Oftentimes the generated script will fail, and that will trigger libretto to identify the failure and update the code and repeat this process until the script works. That iteration process helps make the scripts much more reliable.
- The way our `snapshot` command works is that we send a screenshot + DOM (depending on size may be condensed) to a separate LLM and ask it to figure out the relevant selectors. We do this to not pollute context of main agent with the DOM + lots of screenshots. As a part of that analyzers prompt we tell it to prefer selectors using: data-testid, data-test, aria-label, name, id, role. This just lives in the analyzer prompt and is not deterministic though. It'd be interesting to see if we can improve script quality if we add a hard constraint on the selectors or with different prompting.
I'm also curious if you have any guidance for prompt improvements we can give the snapshot analyzer LLM to help it pick more robust selectors right off the bat.