I maintain an open-source project called Linkwarden and this exact discussion is one of the reasons why it exists, teams needed a way to preserve referenced URLs reliably without having to depend on external services.

It stores webpages in multiple formats (HTML snapshot, screenshot, PDF snapshot, and a fully dedicated reader view) so you’re not relying on a single fragile archive method.

There’s both a hosted cloud plan [1] which directly supports the project, and a fully self-hosted option [2], depending on how much control you need over storage and retention.

[1]: https://linkwarden.app

[2]: https://github.com/linkwarden/linkwarden

Linkwarden is awesome and with the singlefile extension it's pretty easy to store things you can see but the scraper gets blocked on.

One question, what's your stance on adding a way to mark articles as read or "archive" them like other apps that are branded a bit more as storing things to read later. You can technically do something similar with tags but it's a bit clunky of a UX.

Neat. How does the archive.org integration works?

Does it just POST the url to them for them to fetch? Or is there any integration/trust to store what you already fetched on the client directly on their archives?