After mucking around with various easy to use options my lack of trust[1] pushed me into a more-complicated-but-at-least-under-my-control-option: syncthing+restic+s3 compatible cloud provider.
Basically it works like this:
- I have syncthing moving files between all my devices. The larger the device, the more stuff I move there[2]. My phone only has my keepass file and a few other docs, my gaming PC has that plus all of my photos and music, etc.
- All of this ends up on a raspberry pi with a connected USB harddrive, which has everything on it. Why yes, that is very shoddy and short term! The pi is mirrored on my gaming PC though, which is awake once every day or two, so if it completely breaks I still have everything locally.
- Nightly a restic job runs, which backs up everything on the pi to an s3 compatible cloud[3], and cleans out old snapshots (30 days, 52 weeks, 60 months, then yearly)
- Yearly I test restoring a random backup, both on the pi, and on another device, to make sure there is no required knowledge stuck on there.
This is was somewhat of a pain to setup, but since the pi is never off it just ticks along, and I check it periodically to make sure nothing has broken.
[1] there is always weirdness with these tools. They don't sync how you think, or when you actually want to restore it takes forever, or they are stuck in perpetual sync cycles
[2] I sync multiple directories, broadly "very small", "small", "dumping ground", and "media", from smallest to largest.
[3] Currently Wasabi, but it really doens't matter. Restic encrypts client side, you just need to trust the provider enough that they don't completely collapse at the same time that you need backups.
I also have a lil script that rolls dice on restic snapshot, then lists files and picks a random set to restore to /dev/null.
I still trust restic checksums will actually check whether restore is correct, but that way random part of storage gets tested every so often in case some old pack file gets damaged
That's a really good idea, imma steal it.
We need to talk about The Cone of Backups(tm), which you and I seem to have separately derived!
Props for getting this implemented and seemingly trusted... I wish there was an easier way to handle some of this stuff (eg: tiny secure key material => hot syncthing => "live" git files => warm docs and photos => cold bulk movies, isos, etc)... along with selective "on demand pass through browse/fetch/cache"
They all have different policy, size, cost, technical details, and overall SLA/quality tradeoffs.
Does syncthing work yet?
~ 5 years ago, I had a development flow that involved a large source tree (1-10K files, including build output) that was syncthing-ed over a residential network connection to some k8s stuff.
Desyncs/corruptions happened constantly, even though it was a one-way send.
I've never had similar issues with rsync or unison (well, I have in unison, but that's two-way sync, and it always prompted to ask for help by design).
Anyway, my decade-old synology is dying, so I'm setting up a replacement. For other reasons (mostly a decade of systemd / pulse audio finding novel ways to ruin my day, and not really understanding how to restore my synology backups), I've jumped ship over to FreeBSD. I've heard good things about using zfs to get:
saniod + syncoid -> zfs send -> zfs recv -> restic
In the absence of ZFS, I'd do:
rsync -> restic
Or:
unison <-> unison -> restic.
So, similar to what you've landed on, but with one size tier. I have docker containers that the phone talks to for stuff like calendars, and just have the source of the backup flow host my git repos.
One thing to do no matter what:
Write at least 100,000 files to the source then restore from backup (/ on a linux VM is great for this). Run rsync in dry run / checksum mode on the two trees. Confirm the metadata + contents match on both sides. I haven't gotten around to this yet with the flow I just proposed. Almost all consumer backup tools fail this test. Comments here suggest backblaze's consumer offering fails it badly. I'm using B2, but I haven't scrubbed my backup sets in a while. I get the impression it has much higher consistency / durability.
I will say I specifically don't sync git repos (they are just local and pushed to github, which I consider good enough for now), and I am aware that syncthing is one more of those tools that does not work well with git.
syncthing is not perfect, and can get into weird states if you add and remove devices from it for example, but for my case it is I think the best option.
I've personally had no major issues with syncthing, it just works in the background, the largest folder I have synced is ~6TB and 200k files which is mirroring a backup I have on a large external.
One particular issue I've encountered is that syncthing 2.x does not work well for systems w/o an SSD due to the storage backend switching to sqlite which doesn't perform as well as leveldb on HDDs, the scans of the 6TB folder was taking an excessively long time to complete compared to 1.x using leveldb. I haven't encountered any issues with mixing the use of 1.x and 2.x in my setup. The only other issues I've encountered are usually related to filename incompatibilites between filesystems.
Anecdotally, I've been managing a Syncthing network with a file count in the ~200k range, everything synced bidirectionally across a few dozen (Windows) computers, for 9 years now; I've never seen data loss where Syncthing was at fault.
Good to know. I wonder what the difference is. We were doing things like running go build inside the source directory. Maybe it can't handle write races well on Linux/MacOS?