There's many factors but the largest are that it comes down to the fact there weren't many search companies, and they weren't that well capitalised. This meant there wasn't really competition for "freshness" in your results. There are many many many AI companies, and even more AI data companies providing the data to those doing the actual training.
Finally search engines don't actually cache all the text, but do something akin to calculating embeddings/keywords and stuff like pagerank which just uses links. AI companies however want ALL the text/image/video data, and it's too expensive to store this all. It is however cheap to download it every time you need it. (Data ingress is usually free, as opposed to data egress)