> Avoid detection with built-in anti-bot patches and proxy configuration for reliable web scraping.

And it doesn't care about robots.txt.

Good point. The anti-bot patches here (via Patchright) are about preventing the browser from being detected as automated — things like CDP leak fixes so Cloudflare doesn't block you mid-session. It's not about bypassing access restrictions.

Our main use case is retail price monitoring — comparing publicly listed product prices across e-commerce sites, which is pretty standard in the industry. But fair point, we should make that clearer in the README.

robots.txt is the most basic access restrictions and it doesn't even read it, while faking itself as human[0]. It is about bypassing access restrictions.

[0]: https://github.com/lightfeed/extractor/blob/d11060269e65459e...

Regardless. You should still respect robots.txt..

We do respect robots.txt production - also scraping browser providers like BrightData enforces that.

I will add a PR to enforce robots.txt before the actual scraping.

How can people believe that you are respecting bot detection in production when your software's README says it can "Avoid detection with built-in anti-bot patches"?

I hear you loud and clear - will replace the stealth browser with plain playwright and remove anti-bot as a feature.

> It's not about bypassing access restrictions.

Yes. It is. You've just made an arbitrary choice not to define it as such.

I will add a PR to enforce robots.txt before the actual scraping.

Or just follow web standards and define and publish your User-Agent header, so that people can block that as needed.

You're creating the wrong kind of value. I really hope your company fails, as its success implies a failure of the web in general.

I wish you the best success outside of your current endeavour.