recently I've come to believe even IA and especially archive.is are ephermal. I've watched sites I've saved disappear without a trace, except in my selfhosted archives.
A technological conundrum, however, is the fact that I have no way to prove that my archive is an accurate representation of a site at a point in time. Hmmm, or maybe I do? Maybe something funky with cert chains could be done.
There are timestamping services out there, some of which may be free. It should (I think) be possible to basically submit the target site's URL to the timestamping service, and get back a certificate saying "I, Timestamps-R-US, assert that the contents of https://targetsite.com/foo/bar downloaded at 12:34pm on 29/5/2025 hashes to abc12345 with SHA-1", signed with their private key and verifiable (by anyone) with their public key. Then you download the same URL, and check that the hashes match.
IIUC the timestamping service needs to independently download the contents itself in order to hash it, so if you need to be logged in to see the content there might be complications, and if there's a lot of content they'll probably want to charge you.
Websites don't really produce consistent content even from identical requests though.
But you also don't need to do this: all you need is a service which will attest that it saw a particular hashsum at a particular time. It's up to other mechanisms to prove what that means.
> But you also don't need to do this: all you need is a service which will attest that it saw a particular hashsum at a particular time. It's up to other mechanisms to prove what that means.
"That URL served a particular hash at a particular time" or "someone submitted a particular hash at a particular time" provide very different guarantees and the latter will be insufficient to prove your archive is correct.
> Websites don't really produce consistent content even from identical requests though.
Often true in practice unfortunately, but to the extent that it is true, any approach that tries to use hashes to prove things to a third party is sunk. (We could imagine a timestamping service that allows some kind of post-download "normalisation" step to strip out content that varies between queries and then hash the results of that, but that doesn't seem practical to offer as a free service.)
> all you need is a service which will attest that it saw a particular hashsum at a particular time
Isn't that what I'm proposing?
sign it with gpg and upload the sig to bitcoin
edit: sorry, that would only prove when it was taken, not that it wasn’t fabricated.
hash the contents
signing it is effectively the same thing. question is how to prove that what you hashed is what was there?
you can't, because unless you're not the only one with a copy, your hash cannot be verified (since both hash and claim comes from you).
One way to make this work is to have a mechanism like bitcoin (proof of work), where the proof of work is put into the webpage itself as a hash (made by the original author of that page). Then anyone can verify that the contents wasn't changed, and if someone wants to make changes to it and claim otherwise, they'd have to put in even more proof of work to do it (so not impossible, but costly).
I think there was a way to preserve TLS handshake information in a way that something something you can verify you got the exact response from the particular server? I can’t look it up now though, but I think there was a Firefox add-on, even.
I don't think how this can work. While the handshake uses asymmetric crypto, that step then gives you a symmetric key that will be used for the actual content. You need that key to decrypt the content but if you have it you can also use it to encrypt your own content and substitute it in the encrypted stream.
what if instead of the proof of work being in the page as a hash, that the distributed proof of work is that some subset of nodes download a particular bit of html or json from a particular URI, and then each node hashes that, saves the contents and the hash to a blockchain-esque distributed database. Subject to 51% attack, same as any other chain, but still.