In my experience, Google (among others) plays nice. Just put "disallow: *" in your robots.txt, and they won't bother you again.

My current problem is OpenAI, that scans massively ignoring every limit, 426, 444 and whatever you throw at them, and botnets from East Asia, using one IP per scrap, but thousands of IPs.