Hacker News

Doesn’t Kafka/Redpanda have to fsync for every message?

Yes, for Redpanda. There's a blog about that:

"The use of fsync is essential for ensuring data consistency and durability in a replicated system. The post highlights the common misconception that replication alone can eliminate the need for fsync and demonstrates that the loss of unsynchronized data on a single node still can cause global data loss in a replicated non-Byzantine system."

However, for all that said, Redpanda is still blazingly fast.

https://www.redpanda.com/blog/why-fsync-is-needed-for-data-s...

uberduper 4 days ago [ - ]

I'm highly skeptical of the method employed to simulate unsync'd writes in that example. Using a non-clustered zookeeper and then just shutting it down, breaking the kafka controller and preventing any kafka cluster state management (not just preventing partition leader election) while manually corrupting the log file. Oof. Is it really _that_ hard to lose ack'd data from a kafka cluster that you had to go to such contrived and dubious lengths?

mxey 4 days ago [ - ]

> while manually corrupting the log file

To be fair, since without fsync you don't have any ordering guarantees for your writes, a crash has a good chance of corrupting your data, not just losing recent writes.

That's why in PostgreSQL it's feasible to disable https://www.postgresql.org/docs/18/runtime-config-wal.html#G... but not to disable https://www.postgresql.org/docs/18/runtime-config-wal.html#G....

mxey 4 days ago [ - ]

I just read the post and didn’t find it contrived at all. The point is to simulate a) network isolation and b) loss of recent writes.

kasey_junk 4 days ago [ - ]

Kafka no longer has Zookeeper dependency and RedPanda never did (this is just an aside for those reading along, not a rebuttal).

jackvanlightly 4 days ago [ - ]

We fixed that particular issue: https://jack-vanlightly.com/blog/2023/8/17/kafka-kip-966-fix...

uberduper 4 days ago [ - ]

I've never looked at redpanda, but kafka absolutely does not. Kafka uses mmapped files and the page cache to manage durable writes. You can configure it to fsync if you like.

mxey 4 days ago [ - ]

If I don’t actually want durable and consistent data, I could also turn off fsync in Postgres …

mrkeen 4 days ago [ - ]

The tradeoff here is that Kafka will still work perfectly if one of its instances goes down. (Or you take it down, for upgrades, etc.)

Can you lose one Postgres instance?

zozbot234 4 days ago [ - ]

AIUI Postgres has high-availability out of the box, so it's not a big deal to "lose" one as long as a secondary can take over.

mxey 4 days ago [ - ]

Only replication is built-in, you need to add a cluster manager like Patroni to make it highly-available.

kragen 4 days ago [ - ]

Definitely not in the case of Kafka. Even with SSD that would limit it to around 100kHz. Batch commit allows Kafka (and Postgres) to amortize fsync overhead over many messages.

UltraSane 4 days ago [ - ]

On enterprise grade storage writes go to NVRAM buffers before being flushed to persistent storage so this isn't much of a bottleneck.

mxey 4 days ago [ - ]

The context was somebody doing this on their laptop.

UltraSane 4 days ago [ - ]

I was expanding the context

noselasd 4 days ago [ - ]

No, it's for every batch.