Freezing the browser at every step is a very good approach. I am also working on an agent browser. It uses wireframe snapshots instead of screenshots to reduce token cost. https://github.com/agent-browser-io/browser

@theredsix and you should collaborate.

Your tool's method of returning element references is clever and should greatly improve llm handling of the page components (and greatly reduce token cost).