Committing to NVMe drive properly is really costly. I'm talking using O_DIRECT | OSYNC or fsync here. Can be in the order of whole milliseconds, easily. And it is much worse if you are using cloud systems.

It is actually very cheap if done right. Enterprise SSDs have write-through caches, so an O_DIRECT|O_DSYNC write is sufficient, if you set things up so the filesystem doesn't have to also commit its own logs.

I just tested the mediocre enterprise nvme I have sitting on my desk (micron 7400 pro), it does over 30000 fsyncs per second (over a thunderbolt adapter to my laptop, even)

Another complexity here besides syncs per second is the size of the requests and duration of this test, since so many products will have faster cache/buffer layers which can be exhausted. The effect is similar whether this is a "non-volatile RAM" area on a traditional RAID controller, intermediate write zones in a complex SSD controller, or some logging/journaling layer on another volume storage abstraction like ZFS.

It is great as long as your actual workload fits, but misleading if a microbenchmark doesn't inform you of the knee in the curve where you exhaust the buffer and start observing the storage controller as it retires things from this buffer zone to the other long-term storage areas. There can also be far more variance in this state as it includes not just slower storage layers, but more bookkeeping or even garbage-collection functions.

If you tested this on macos, be careful. The fsync on it lies.

nope, linux python script that writes a little data and calls os.fsync

What's a little data?

In many situations, fsync flushes everything, including totally uncorrelated stuff that might be running on your system.

fsync on most OSes lie to some degree

Isn't that why a WAL exists, so you didn't actually need to do that with eg postgres and other rdbms?

You must still commit the WAL to disk, this is why the WAL exists it writes ahead to the log on durable storage. Its doesn't have to commit the main storage to disk only the WAL which is better since its just an append to end rather than placing correctly in the table storage which is slower.

You must have a single flushed write to disk to be durable, but it doesn't need the second write.