Freezing JS execution between actions is the kind of obvious idea that nobody did properly untill now. Kudos for actually forking Chromium instead of hacking around Playwright like everybody else.
But here's my thought: you're solving the "stale state" problem by making the browser deterministic. Real websites aren't deterministic. WebSOcket pushes, long-polling, background fetches, animations that don't finish — freezing execution doesn't pause the server. The moment you unfreeze, the world may have moved.
90.5% on Mind2Web is great. But Mind2Web tasks are mostly "fill a form, click submit." The brutal failures happen on SPAs with optimistic UI updates, where the DOM says "saved" but the network request hasn't finished. Does ABP handle that case, or does the freeze just delay the confusion?
Genuine question — not trying to tear this down. The architecture is smart. I just wonder if "make the browser simpler for the agent", eventually hit s a wall where you need to make the agent smarter about async instead.
The freeze sometimes does capture in between states. What I've seen the agent does in those cases is that it recognizes it's in between states and calls browser_wait(). Where the agent goes off the rails isn't a snapshot in the middle of a state transition, (it's smart enough to know to retry in that case), it's when the DOM changes after the agent believes the page has settled.
For async, lots of people smarter than me working on the smarter agent problem. Though there's a latency floor with inference due to prompt processing, and output generation. Without tools like ABP, the LLM is always aiming at a moving target.