It's a move to block competitor AI agents while securing access for your own, classic ladder kick. The market for autonomous agents providing services and doing online work will be gigantic so, unless you want your own bots locked out from ie properties guarded by Amazon, CloudFlare, Microsoft etc., you will need a bargaining chip.
As someone that uses AI agents, this makes me want to install a browser plugin for "public windows" that just archives everything I see, and then farms out clicks of content that are missing from those sites.
The result of this would be to upload it all to a bot-friendly alternative to archive.org.
That exists! Check out Hoardy Web. https://oxij.org/software/hoardy-web/
Its whole point is undetectable archiving because it just saves what your browser already sees.
Nice, I understand it is similar to ArchiveBox + its web extension.
Now to be honest, while it's optimal to archive pages from you browser view I am not sure I want a random web extension to be in everything I see from a security point of view.
I would rather have a local proxy doing it. Maybe something like the InternetArchive warcproc [0]. Haven't tried yet.
- [0] https://github.com/internetarchive/warcprox
for a short time i had warcprox sitting behind my firefox and auto feeding its output to pywb, it seemed to work but i had connections failing randomly after having warcprox running for more than a few hours~days. not sure if it's an issue with pywb or warcprox but there were some urls missing that i did browse on firefox, and many dynamic pages couldn't be replayed at all.