The sub-millisecond writes with data in S3 is false and impossible. If you look at the benchmark the fsync is not timed, so this is just the latency of either the network or in kernel file operations depending on the mount settings

I hate it when databases celebrate their performance without synchronous flushing. You should be clear about data loss window (which should be zero for committed transactions by default!) and the flushing interval to persistent storage.

I'm okay if you batch writes, I'm okay if you offer a low-latency mode with less durability, but by being unclear about this it just feels like a scam.

Where is the "data loss window"? Between nodes or between the client and the infra?

The front page lists under Capabilities:

> 4.8 > Honest fsync > A successful fsync means every acknowledged write is durable in S3. If a failover may have lost unflushed writes, the next fsync returns an error instead of a false success.

Someone else mentioned: "write() is buffered (that's the batching) and "committed" maps to fsync(), which returns only once data is durable."

---

It sounds like all writes are written synchronously to at least one node but failovers/replicas are just eventually consistent. If so, latency between nodes is not within ZeroFS's control and including. Or are you saying that the latency is impossible for even a single node? If so, that would mean much more than just a footnote is needed.

---

I don't see an issue with the benchmark but I might not be looking in the right place.

The [sequential writes](https://github.com/Barre/ZeroFS/blob/ec32199d48d0409d4cccd44...) and [append-only writes](https://github.com/Barre/ZeroFS/blob/ec32199d48d0409d4cccd44...) start where I'd expect and `success: true,` should equate to `fsync()` as previously mentioned.

(Apologies if I got this wrong, still learning this)

Yeah in this case the footnote to the write latency specifically says “at rest in S3”, which is what caused me to go look at the source. To be clear I have no problem with the ZeroFS of only flushing on fsync.

I am very excited for object storage first systems like this to leverage low latency zonal storage for write ahead logs to keep the disaggregated storage but greatly reduce write latency. That ends up being more expensive, but is likely a good tradeoff in lots of cases I have seen

ZeroFS aims to be a POSIX filesystem, the semantics here are the standard ones (ext4, xfs behave the same): write() is buffered (that's the batching) and "committed" maps to fsync(), which returns only once data is durable.

Nothing wrong with that, but you should remove the “at rest in S3” footnote from the write latency on the frontpage of the website, because that is not what is measured

[deleted]
[deleted]