Hacker News

tuyiown 3 days ago [ - ]

I'm with you on this one. Massive IO on directories with many files is only reliable when a single process access to it, which is not the case, fs are by definition open to concurrent IO. Even though it's true that several processes having uncoordinated reading and writing in the same directories is not a typical case, I'm not sure it's something one can afford to ignore.

But in the end both npm and git ends up having mass writing files in their use cases, regardless of meta data that could be put in a sqlite-like db. Making things faster safely really implies having those apps operating on some OS features that would allow of acquiring lock and committing semantics on fs subtrees or equivalent.

refulgentis 3 days ago [ - ]

I'm curious:

Lets take that given, i.e. massive IO works reliably only when a single process has access.

How will SQLite handle concurrent access by multiple processes when git/npm/whoever switches over to it?

(A: It doesn't, even in a single process, you need to obtain a global lock to write while all readers pause.)

crazygringo 3 days ago [ - ]

Your "A" hasn't been true for over a decade if you use WAL mode:

https://sqlite.org/wal.html

Readers don't have to pause.

refulgentis 3 days ago [ - ]

Thank you! I knew about WAL but swore all reads paused to avoid being stale. Now that I think about it, that was my workaround to deal with polling for an update that should be there from the app level perspective that knows about a pending write because it’s in memory.

ori_b 3 days ago [ - ]

Except the statement above is that parallel access to the file system does not work reliably. Sqlite lives on the file system.

crazygringo 3 days ago [ - ]

It's not access to the file system in generaly that doesn't work reliably -- it's specifically massive access across thousands of files at the same time by multiple processes.

Sqlite lives inside a single file. I've never heard of any corruption issues in practice, even with thousands of high-throughput reads and writes -- the kinds that are being complained about. Because this is something SQLite is really good at.

refulgentis 3 days ago [ - ]

I was considering walking down this road, because it's really core to the absurdity of this thread, innit?

But, the original post sort of handwaves about what pathological filesystem use is.

The examples they chose (git & npm) imply # of files.

I posit that as easy as it was to handwave that SQLite is obviously superior for npm/git than using N files, it'll be equally easy to handwave that it won't be a problem because SQLite is one file instead of many.

KerrAvon 3 days ago [ - ]

YMMV. In my experience, concurrent process access to SQLite databases is a one-way ticket to database corruption city. I’ve not had the same problems with single-process concurrent access.

tomsmeding 3 days ago [ - ]

This just sounds like you haven't been using transactions. SQLite upholds transaction guarantees.