The main problem with these approaches is that most sites now are useless without JS or having access to the accessibility tree. Projects like browser-use or other DOM based approaches at least see the DOM(and screenshots).

I wonder if you could turn this into a chrome extension that at least filters and parses the DOM

I actually made a CLI tool recently that uses Puppeteer to render the page including JS, summarizes key info and actions, and enables simple form filling all from a CLI menu. I built it for my own use-cases (checking and paying power bills from CLI), but I'd love to get feedback on the core concept: https://github.com/jadbox/solomonagent

Dude I love this. I've been thinking of doing this exactly this, but for as a screen reader for accessibility reasons.

Thanks, it's alpha at the moment- next feature is complex forms and bug fixing broken actions (downloading). Do give it a spin! Welcome to contribute or drop feedback on the repo :)

True for stuff requiring interaction, but to help their LCP/SEO lots of sites these days render plain html first. It's not "usable" but for viewing it's pretty good