See https://github.com/OpenAdaptAI/OpenAdapt for an open source alternative that includes operating system accessibility API data and DOM information (along with bitmaps) where available.
We are also planning on extracting runtime information using COM/AppleScript: https://github.com/OpenAdaptAI/OpenAdapt/issues/873
It's super cool to see something like this already exists! I wonder if one day something adjacent will become a standard part of major desktop OSs, like a dedicated "AI API" to allow models to connect to the OS, browse the windows and available actions, issue commands, etc. and remove the bitmap parsing altogether as this appears to do.