$500 for exceeding 1TB? The problem here isn't the crawlers, it's your price-gouging, extortionate hosting plan. Pick your favourite $5/month VPS platform - I suggest Hetzner with its 20TB limit (if their KYC process lets you in) or Digital Ocean if not (with only 1TB but overage is only a few bucks extra). Even freaking AWS, known for extremely high prices, is cheaper than that (but still too expensive so don't use it).
> The problem here isn't the crawlers, it's your price-gouging, extortionate hosting plan.
No, it's both.
The crawlers are lazy, apparently have no caching, and there is no immediately obvious way to instruct/force those crawlers to grab pages in a bandwidth-efficient manner. That being said, I would not be surprised if someone here will smugly contradict me with instructions on how to do just that.
In the near term, if I were hosting such a site I'd be looking into slimming down every byte I could manage, using fingerprinting to serve slim pages to the bots and exploring alternative hosting/CDN options.
> The problem here isn't the crawlers,
One of the worst takes I've seen. Yes, that's expensive, but the individuals doing insane amounts of unnecessary scraping are the problem. Let's not act like this isn't the case.
To clarify the math. Netlify bills $50 for each 100GB over the Pro plan limit of 1TB. Which is the barrel I'm looking down just this month before others get the same idea. So yes, I'm squeezed on both side unless I put the work in to rehost.
I went to a Subway shop that charged $50 per lettuce strip past the first 20. As the worker sprinkled lettuce on my sandwich, I counted anxiously, biting my nails. 19, phew, I'm safe. I think I'll come back here tomorrow.
Tomorrow, someone in front of me asked for extra lettuce. The worker got confused and put it on my sandwich. I was charged $1000. Drat.
> The worker got confused and put it on my sandwich.
No, this is where you're completely and totally incorrect. There is no 'worker accidentally making a human mistake that costs you money' here. This is a 'multi-billion dollar company routinely runs scripts that they KNOW cost you money, but do it anyways because it generates profit for them'. To fix your example,
You RUN a Subway that sells sandwiches. Your lettuce provider charges you $1 per piece of lettuce. Your average customer is given $1 worth of lettuce in their sub. One customer keeps coming in, reaching over the counter, and grabbing handfuls of lettuce. You cannot ban this customer because they routinely put on disguises and ignore your signs saying 'NO EXTRA LETTUCE'. Eventually this bankrupts you, forces you to stop serving lettuce in your subs entirely, or you have to put up bars (eg, Cloudflare) over your lettuce bins.