It's interesting that both NTFS and APFS have now apparently been declared 'broken' due to performance problems with two very specific applications (Git and NPM), and that the reverse statement, that is's perhaps those applications that are being unreasonable, isn't quite as fashionable.
I mean, I don't use NPM, but I recently copied over some small-ish raw Git repositories between machines, and had several WTF moments when I looked at what was actually going over the wire.
Instead of refactoring two of the planet's most widely-used 'consumer' file systems, perhaps https://www.sqlite.org/appfileformat.html should be seen as a more reasonable way forward?
I'm with you on this one. Massive IO on directories with many files is only reliable when a single process access to it, which is not the case, fs are by definition open to concurrent IO. Even though it's true that several processes having uncoordinated reading and writing in the same directories is not a typical case, I'm not sure it's something one can afford to ignore.
But in the end both npm and git ends up having mass writing files in their use cases, regardless of meta data that could be put in a sqlite-like db. Making things faster safely really implies having those apps operating on some OS features that would allow of acquiring lock and committing semantics on fs subtrees or equivalent.
I'm curious:
Lets take that given, i.e. massive IO works reliably only when a single process has access.
How will SQLite handle concurrent access by multiple processes when git/npm/whoever switches over to it?
(A: It doesn't, even in a single process, you need to obtain a global lock to write while all readers pause.)
Your "A" hasn't been true for over a decade if you use WAL mode:
https://sqlite.org/wal.html
Readers don't have to pause.
Thank you! I knew about WAL but swore all reads paused to avoid being stale. Now that I think about it, that was my workaround to deal with polling for an update that should be there from the app level perspective that knows about a pending write because it’s in memory.
Except the statement above is that parallel access to the file system does not work reliably. Sqlite lives on the file system.
It's not access to the file system in generaly that doesn't work reliably -- it's specifically massive access across thousands of files at the same time by multiple processes.
Sqlite lives inside a single file. I've never heard of any corruption issues in practice, even with thousands of high-throughput reads and writes -- the kinds that are being complained about. Because this is something SQLite is really good at.
I was considering walking down this road, because it's really core to the absurdity of this thread, innit?
But, the original post sort of handwaves about what pathological filesystem use is.
The examples they chose (git & npm) imply # of files.
I posit that as easy as it was to handwave that SQLite is obviously superior for npm/git than using N files, it'll be equally easy to handwave that it won't be a problem because SQLite is one file instead of many.
YMMV. In my experience, concurrent process access to SQLite databases is a one-way ticket to database corruption city. I’ve not had the same problems with single-process concurrent access.
This just sounds like you haven't been using transactions. SQLite upholds transaction guarantees.
"NTFS" is fine, the Windows File System Filters are the culprit of I/O performance problems [with many small files/requests] on Windows.
Using a second partition D: is already twice as fast at small-file-writes compared to the system partition C:. This was on Windows 10 with both partitions using NTFS on the same SSD, and everything else on default settings.
This is because C: has additional file system filters for system integrity that are not present on other drives; and also because C: has 8.3 name compatibility enabled by default, but additional NTFS partitions have that disabled. (so if your filenames are longer than the 8.3 format, Windows has to write twice the number of directory entries)
It's shocking how much performance you gain by temporarily turning off Windows Defender. I had a local disk to disk copy that the Windows File Transfer dialog estimated was going to finish in 10 hours after it settled down. It wasn't even that much data, just a few hundred GB, but that consisted of a great number of small files. On a hunch I tried disabling Windows Defender and the estimate dropped to about 30 minutes, and in the end it was spot on.
Anti-virus kills small file IO. I work with a windows product that can deal with huge amounts of code files. With large enterprise that demands AV is on in most places the performance loss is pretty staggering, were you can easily lose 50% of your servers capacity per measurement interval.
More people need to know how this, it is absolutely bonkers how much perf you lose for each file open operation to defender and its kin.
Even excluding directories (like D:\MyProject) from defender isn't a full solution, the fs filter will still use the file handle to do lookup of the file path to see if it is in an excluded directory (at least on win10), but excluding an entire drive (like D:\) does solve this.
I have found that adding git.exe to the process exclusion list makes the biggest difference here, although git is still slow when dealing with lots of files, it goes from unbearable to bearable.
Doesn’t this just lead to the world where every application has its own specific schema and we lose any ability to operate on files in generic manner? So basically like iOS, and I don’t see many people using iOS as their development environment.
I have had reports from my colleagues that some ruby build or something is much faster when using a ubuntu virtual machine on their M3 MacOS laptops. I don't remember the numbers but it was upto 5x faster in the VM
I had Ubuntu be 6-7x faster than Windows 10, running Ubuntu in VirtualBox.
I am quite surprised by:
- the claim that pathological filesystem behavior => git/npm/whoever should use a SQLite DB as a flat file rather than having multiple files.
- no pushback
- the reply w/"yes, definitely, by shoving all the files into a SQLite DB, concurrency & contention issues simply disappear"
There's a sort of comic "turtles all the way down" aspect to it.
Why not just have the filesystem be a SQLite DB then? (A: because it would suffer from all of the same problems, except worse, because it is not a filesystem)
Not necessarily sqlite, but rocksdb is looking like promising option for foundational storage system. You have things like zenfs which runs rocksdb directly on nvme storage, and then you have lot of higher level systems on top of it like Cephs BlueStore and mysql/maria MyRocks. It is not much of a stretch to imagine whole OS built around it
What is notable here is that it would be both better aligned with how the underlying storage hardware works, and simultaneously also offers more usable interface for applications.
It is pretty apparent that the biggest reason why we commonly use posixy files is inertia rather than them being very good fit for anything these days
https://en.wikipedia.org/wiki/RocksDB#Integration
https://github.com/westerndigitalcorporation/zenfs
> Why not just have the filesystem be a SQLite DB then?
One major reason not to is that there needs to be a guaranteed way to free up space, e.g. by deleting a file, or doing something similar. Currently SQLite doesn't provide any means to reliably reclaim space without allocating even more first.
Also the API is quite different, e.g. if you can write(2) a single byte at a time if you really want to, and it's relatively fast compared to doing e.g. an INSERT each time, since it goes to the temp VFS buffer first
> Why not just have the filesystem be a SQLite DB then?
A lot of people have proposed exactly this, that modern filesystems ought to be relational databases at the directory/metadata level.
It would be amazing to be able to interface with the file system using SQL and have the power of transactions.
> A: because it would suffer from all of the same problems, except worse, because it is not a filesystem
Not really. Database technologies really are far more advanced in many ways. It would not suffer from many filesystem problems, and would actually be better in many ways.
This is quite a claim to me, but only because the Vista FS died on these shoals, and because it seems we’ve had plenty of years for this idea to reach some sort of practical implementation and there’s none that’s in wide use that I’m aware of.
I think compatibility is just too much of an issue. Creating an entirely new filesystem like that, with next-generation tools around things like transactions, while keeping backwards compatibility with every quirk of every way things operate currently... it feels like an almost impossible architectural challenge. While if you just started from scratch with brand-new API's, existing applications wouldn't work with it.
Still, it doesn't stop people constantly bringing it up as a wish. Technically it would be amazing, and it wouldn't be hard to build. It just feels impossible to figure out how to migrate to.
I don’t really get it but I’m p sure it’s one of those things where I’m being a know-nothing engineer. (Been on mobile my whole life, 2009-2015 I owed primarily to SQLite-based app, p much 0 filesystem knowledge beyond the horrible analogy incoming)
The unfiltered reaction I have is “the filesystem is essentially Map<String, ByteData>, how would adding another abstraction layer _help_? How could it be impossible to migrate?” - you’ve been really informative, ignore my kvetching here, the kvetching side firmly holds if this was such a great idea _someone_ would have landed it _somewhere_ or there’d be more clear benefits stated than handwaving git and npm got it horribly wrong or more clear problems stated than “we don’t know how to migrate to it if we did it”
See also: SQLite: 35% Faster Than File System
https://sqlite.org/fasterthanfs.html
I think it might be a lot more than that, on Windows and macOS.
Edit: Yeah, 5x boost on Win10. I wish author had included XP for reference!
Edit 2: Author achieved 14x boost with a certain SQLite mode, over Windows 10.
> The third chart shows that reading blob content out of SQLite can be twice as fast as reading from individual files on disk for Mac and Android, and an amazing ten times faster for Windows.
> Why not just have the filesystem be a SQLite DB then?
Already exists: https://github.com/narumatt/sqlitefs
No idea what the performance is like, though… it uses FUSE, so that is going to add some overhead compared to an in-kernel filesystem
It's shockingly faster in WSL too.
That might not be a filesystem issue. If NPM pulling a complex dependency tree down spawns a lot of sub-processes then it'll hit the same problem many build operations have when transplanted from Unix-a-likes to Windows (via cygwin, for instance): building up and tearing down processes is relatively expensive under Windows. Under many Unix-a-like systems forking a process is generally not much more expensive than starting a new thread, which is why multi-process solutions are more popular away from Windows (if the cost of using processes over threads is low, then it is worth paying it to get the reduced complexity of not needing to worry about thread safety).
Assuming you are using WSL2 (which is essentially Linux in a VM, unlike WSL1 which was a compatibility layer with processes running more directly on Windows) you will see this difference (any performance cost imposed by the VM will be small compared to the difference in process spin-up costs).
I don't think that this will apply to git, though.
Is it also faster when you use a host OS dir? (WSL can write to those, and it might still bypass a lot of the filter stuff.)