sorry. i didn't mean to say that's the only thing this agent is doing is screenshotting. just that it was a thing my agent is doing which has this neat property. i also have a host of other things going on when it does need to grab and understand the contents of the page. the screenshot is used in conjunction with the html to navigate and find things. but it's also doing things this particular test tries (hidden divs, aria=hidden, etc.). also tries to message the model about what's trusted and untrusted.

but the big thing I have in here is simply a cross domain check. if the domain is about to be navigated away from, we alert the user to changing domains. this is all in a browser context too so a browsers csrf protection is also being relied on. but its the cross domain navigation i'm really worried about. and trying to make sure i've gotten super hardened. but this is the trickiest part in a browser admittedly. i feel like browsers are going to need a new "non-origin" kind of flow that knows an agent is browsing and does something like blocking and confirming natively.

[flagged]