Afaik they respect robots.txt on crawl and later when using the data they re-check the robots.txt and will exclude the data if the new robots.txt was updated to deny access. They have further data filtering bit for that you better check the technical report.