Hacker News

I don’t think this is the right way to see this.

Why should a file backup solution adapt to work with git? Or any application? It should not try to understand what a git object is.

I’m paying to copy files from a folder to their servers just do that. No matter what the file is. Stay at the filesystem level not the application level.

noirscape 10 hours ago [ - ]

I'm not saying Backblaze should adapt to git; the issue isn't application related (besides git being badly configured by default; there's a solution with git gc, it's just that git gc basically never runs).

It's that to back up a folder on a filesystem, you need to traverse that folder and check every file in that folder to see if it's changed. Most filesystem tools usually assume a fairly low file count for these operations.

Git, rather unusually, tends to produce a lot of files in regular use; before packing, every commit/object/branch is simply stored as a file on the filesystem (branches only as pointers). Packing fixes that by compressing commit and object files together, but it's not done by default (only after an initial clone or when the garbage collector runs). Iterating over a .git folder can take a lot of time in a place that's typically not very well optimized (since most "normal" people don't have thousands of tiny files in their folders that contain sprawled out application state.)

The correct solution here is either for git to change, or for Backblaze to implement better iteration logic (which will probably require special handling for git..., so it'd be more "correct" to fix up git, since Backblaze's tools aren't the only ones with this problem.)

masfuerte 9 hours ago [ - ]

7za (the compression app) does blazingly fast iteration over any kind of folder. This doesn't require special code for git. Backblaze's backup app could do the same but rather than fix their code they excluded .git folders.

When I backup my computer the .git folders are among the most important things on there. Most of my personal projects aren't pushed to github or anywhere else.

Fortunately I don't use Backblaze. I guess the moral is don't use a backup solution where the vendor has an incentive to exclude things.

toast0 4 hours ago [ - ]

IMHO, you can't do blazingly fast iteration over folders with small files in Windows, because every open is hooked by the anti-virus, and there goes your performance.

noirscape an hour ago [ - ]

Not just antivirus, there's also file locking.

Windows has a much harsher approach to file locking than Linux and backup software like BackBlaze absolutely should be making use of it (lest they back up files that are being modified while they back them up), but that also means that the software effectively has to ask the OS each time to lock the file, then release the lock when the software is done with it. With a large amount of files, that does stack up.

Linux file locking is to put it mildly, deficient. Most software doesn't even bother acquiring locks in the first place. Piling further onto that, basically nobody actually uses POSIX locks because the API has some very heavy footguns (most notably, every lock on a file is released whenever any close() for that file is called, even if another component of the same process is also having a second lock open). Most Linux file locks instead work on the honor system; you create a file called filename.lock in the same directory as the file you're working on, and then any software that detects the filename.lock file exists should stop reading the file.

Nobody using file locks is probably the bigger reason why Linux chokes less on fast iteration than Windows, given that Windows is slow with loads of files even when you aren't running a virus scanner.

NetMageSCW 7 hours ago [ - ]

Actually once the initial backup is done there is no reason to scan for changes. They can just use a Windows service that tells them when any file is modified or created and add that file to their backup list.

Saris 8 hours ago [ - ]

Backblaze offers 'unlimited' backup space, so they have to do this kind of thing as a result of that poor marketing choice.

adithyassekhar 2 hours ago [ - ]

If they must scam, shouldn’t they be deduplicating on the server rather than the client?

conductr 6 hours ago [ - ]

No they don’t. They just have to price the product to reflect changing user patterns. When backblaze started, it was simply “we back up all the files on your drive” they didn’t even have a restore feature that was your job when you needed it. Over time they realized some user behavior changed, these Cloud drives where a huge data source they hadn’t priced in, git gave them some problems that they didn’t factor in, etc. The issue is there solution to dealing with it is to exclude it and that means they’re now a half baked solution to many of their users, they should have just changed the pricing and supported the backup solution people need today.