Hacker News

This seems to store the zfs send stream. That's a bad idea.

> Incremental ZFS send streams do not have any of these properties and full ZFS send streams only have a few of them. Neither full nor incremental streams have any resilience against damage to the stream; a stream is either entirely intact or it's useless. Neither has selective restores or readily available indexes. Incremental streams are completely useless without everything they're based on. All of these issues will sooner or later cause you pain if you use ZFS streams as a backup format.

https://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSSendNotA...

TheNewsIsHere a day ago [ - ]

If you’re looking to ZFS send streams for many of ZFS’s guarantees, absolutely.

However ZFS replication was originally designed with the assumption and use case in mind that organizations would store the send streams as opaque blobs on tape. This is, in part, why storing send streams as blobs is still a thing people do.

There are some use cases where this makes sense. I’ve stored full send streams of archived ZFS file systems in S3(-compatible services) where integrity is handled at the platform level. In that use case I didn’t benefit from having every copy of the filesystem in question running on live storage media, and incremental sends/snapshots weren’t on the table. (I also SHA checksummed the resulting files, and did restore tests.)

There is also a misconception that frequently gets brought up in the ZFS community that the send stream format isn’t stable between versions and cannot be relied upon in future, but it absolutely is stable. In fact the ZoL manpage for send explicitly states that it is. As with anything in ZFS though, you want to move versions forward or not at all, rather than backward.

ticklyjunk 8 hours ago [ - ]

we are writing blobs/ objects to zfs tape volumes. it gives us an extra layer of defense from ransom attacks and satisfies our 321 requirement. We make the blobs transparent with some metadata tags. The objects are recorded in the catalog and we can pull individual files out of the blob. deepspace storage manages the tape gateway and catalog for the objects. short answer yes storing send streams to tape is doable robust workflow.

levkk 2 days ago [ - ]

Finally! Been looking for this a long time. File-based backups for large Pg databases are not very scalable (even incremental), having this in my toolkit would be amazing.

craftkiller 2 days ago [ - ]

I'm not sure I follow. Wouldn't this be file-based (zfs-dataset-based) incremental backups? I don't think this has anything to do with postgresql other than copying the style of pgBackRest.

blacklion 2 days ago [ - ]

this uses `zfs send @snapshot` which is block-level, not file-level.

craftkiller 2 days ago [ - ]

Oh! So the issue with large postgres backups is the number of files?

levkk 2 days ago [ - ]

No. Postgres data files are 1Gb each. When you change just one byte in a table, a whole 1Gb file gets updated (just 1 byte change, effectively). Your file-based backup tool now has to upload 1Gb of data to save 1 byte of actual changes.

mrflop a day ago [ - ]

That's a fair point, and it's a known challenge with file-based backups on systems like Postgres. That said, some backup systems implement chunk-level deduplication and content-addressable storage, which can significantly reduce the amount of data actually transferred, even when large files change slightly.

For example, tools like Plakar (contributor here) split data into smaller immutable chunks and only store the modified ones, avoiding full re-uploads of 1GB files when only a few bytes change.

Tostino 2 days ago [ - ]

They fixed that in pgbackrest a while ago: https://pgbackrest.org/user-guide.html#backup/block

It was a major pain point for my backups for years.

levkk 2 days ago [ - ]

Does that work with S3, etc.? I don't remember them allowing partial file uploads.

Tostino 2 days ago [ - ]

I believe so, because it is done in conjunction with their file bundling feature and doesn't rely on support from the blob storage backend.

They create a new file with the diffs of a bundle of Postgres files, and upload that to blob storage.