Exactly. Just yesterday someone posted how they can do 250k messages/second with Redpanda (Kafka-compatible implementation) on their laptop.

https://www.youtube.com/watch?v=7CdM1WcuoLc

Getting even less than that throughput on 3x c7i.24xlarge — a total of 288 vCPUs – is bafflingly wasteful.

Just because you can do something with Postgres doesn't mean you should.

> 1. One camp chases buzzwords.

> 2. The other camp chases common sense

In this case, is "Postgres" just being used as a buzzword?

[Disclosure: I work for Redpanda; we provide a Kafka-compatible service.]

Is it about what Kafka could get or what you need right now.

Kafka is a full on steaming solution.

Postgres isn’t a buzzword. It can be a capable placeholder until it’s outgrown. One can arrive at Kafka with a more informed run history from Postgres.

> Kafka is a full on steaming solution.

Freudian slip? ;)

Haha, and a typo!

This sounded interesting to me, and it looks like the plan is to make Redpanda open-source at some point in the future, but there's no timeline: https://github.com/redpanda-data/redpanda/tree/dev/licenses

Correct. Redpanda is source-available.

When you have C++ code, the number of external folks who want to — and who can effectively, actively contribute to the code — drops considerably. Our "cousins in code," ScyllaDB last year announced they were moving to source-available because of the lack of OSS contributors:

> Moreover, we have been the single significant contributor of the source code. Our ecosystem tools have received a healthy amount of contributions, but not the core database. That makes sense. The ScyllaDB internal implementation is a C++, shard-per-core, future-promise code base that is extremely hard to understand and requires full-time devotion. Thus source-wise, in terms of the code, we operated as a full open-source-first project. However, in reality, we benefitted from this no more than as a source-available project.

Source: https://www.scylladb.com/2024/12/18/why-were-moving-to-a-sou...

People still want to get free utility of the source-available code. Less commonly they want be able to see the code to understand it and potentially troubleshoot it. Yet asking for active contribution is, for almost all, a bridge too far.

Note that prior to its license change ScyllaDB was using AGPL. This is a fully FLOSS license but may have been viewed nonetheless as somewhat unfriendly by potential outside contributors. The ScyllaDB license change was really more about not wanting to expend development effort on maintaining multiple versions of the code (AGPL licensed and fully proprietary), so they went for sort of a split-the-difference approach where the fully proprietary version was in turn made source-available.

(Notably, they're not arguing that open source reusers have been "unfair" to them and freeloaded on their effort, which was the key justification many others gave for relicensing their code under non-FLOSS terms.)

In case anyone here is looking for a fully-FLOSS contender that they may want to perhaps contribute to, there's the interesting project YugabyteDB https://github.com/yugabyte/yugabyte-db

I think AGPL/Proprietary license split and eventual move to proprietary is just a slightly less overt way of the same "freeloader" argument. The intention of the original license was to make the software unpalatable to enterprises unless you buy the proprietary license, and one "benefit" of the move (at least for the bean counters) is that it stops even AGPL-friendly enterprises from being able to use the software freely.

(Personally, I have no issues with the AGPL and Stallman originally suggested this model to Qt IIRC, so I don't really mind the original split, but that is the modern intent of the strategy.)

I think the intention of the original license was to make the software unpalatable to SaaS vendors who want to keep their changes proprietary, not unpalatable to enterprises in general.

Rightly or wrongly, large companies are very averse to using AGPL software even if it would cause them very little additional burden to comply with the AGPL. Lots of projects use this cynically to help sell proprietary licenses (the proof of this is self-evident -- many such projects have CLAs and were happy to switch to a proprietary license that is even less favourable to enterprises than the AGPL as soon as it was available).

Again, I'm happy to use AGPL software, I just disagree that the intent here is that different to any of the other projects that switched to the proprietary BSL.

I haven't actually talked with Henry Poole about the subject, but I'm pretty sure that was not his intent when he wrote it.

You are obviously free to choose to use a proprietary license, that's fine -- but the primary purpose of free licenses has very little to do with contributing code back upstream.

As a maintainer of several free software projects, there are lots of issues with how projects are structured and user expectations, but I struggle to see how proprietary licenses help with that issue (I can see -- though don't entirely buy -- the argument that they help with certain business models, but that's a completely different topic). To be honest, I have no interest in actively seeking out proprietary software, but I'm certainly in the minority on that one.

Right, open source is generally of benefit to users, not to the author, and users do get some of that benefit from being able to see the source. I wouldn't want to look at it myself, though, for legal reasons.

You can be open source and not take contributions. This argument doesn't make sense to me. Just stop doing the expensive part and keep the license as is.

I think the argument is that, if they expected to receive high-quality contributions, then they'd be willing to take the risk of competitors using their software to compete with them, which an open-source license would allow. It usually doesn't work out that way; with a strong copyleft license, your competitors are just doing free R&D improving your own product, unless they can convince your customers that they know more about the product than the guys who wrote it in the first place. But that's usually the fear.

On the other hand, if they don't expect people outside their company to know C++ well enough to contribute usefully, they probably shouldn't expect people outside their company to be able to compete with them either.

Really, though, the reason to go open-source is because it benefits your customers, not because you get contributions, although you might. (This logic is unconvincing if you fear they'll stop being your customers, of course.)

The statement is untrue. For example, ClickHouse is in C++, and it has thousands of contributors with hundreds of external contributors every month.

I think it's reasonably common for accepting external contributions to an open-source project to be more trouble than it's worth, just because most programmers aren't very good.

Your name sounds familiar. I think you may be one of the people at RedPanda with whom I’ve corresponded. It’s been a few years though, so maybe not.

A colleague and I (mostly him, but on my advice) worked up a set of patches to accept and emit JSON and YAML in the CLI tool. Our use case at the time was setting things up with a config management system using the already built tool RedPanda provides without dealing with unstructured text.

We got a lot of good use out of RedPanda at that org. We’ve both moved on to a new employer, though, and the “no offering RedPanda as a service” spooked the company away from trying it without paying for the commercial package. Y’all assured a couple of us that our use case didn’t count as that, but upper management and legal opted to go with Kafka just in case.

Doesn’t Kafka/Redpanda have to fsync for every message?

Yes, for Redpanda. There's a blog about that:

"The use of fsync is essential for ensuring data consistency and durability in a replicated system. The post highlights the common misconception that replication alone can eliminate the need for fsync and demonstrates that the loss of unsynchronized data on a single node still can cause global data loss in a replicated non-Byzantine system."

However, for all that said, Redpanda is still blazingly fast.

https://www.redpanda.com/blog/why-fsync-is-needed-for-data-s...

I'm highly skeptical of the method employed to simulate unsync'd writes in that example. Using a non-clustered zookeeper and then just shutting it down, breaking the kafka controller and preventing any kafka cluster state management (not just preventing partition leader election) while manually corrupting the log file. Oof. Is it really _that_ hard to lose ack'd data from a kafka cluster that you had to go to such contrived and dubious lengths?

> while manually corrupting the log file

To be fair, since without fsync you don't have any ordering guarantees for your writes, a crash has a good chance of corrupting your data, not just losing recent writes.

That's why in PostgreSQL it's feasible to disable https://www.postgresql.org/docs/18/runtime-config-wal.html#G... but not to disable https://www.postgresql.org/docs/18/runtime-config-wal.html#G....

I just read the post and didn’t find it contrived at all. The point is to simulate a) network isolation and b) loss of recent writes.

Kafka no longer has Zookeeper dependency and RedPanda never did (this is just an aside for those reading along, not a rebuttal).

I've never looked at redpanda, but kafka absolutely does not. Kafka uses mmapped files and the page cache to manage durable writes. You can configure it to fsync if you like.

If I don’t actually want durable and consistent data, I could also turn off fsync in Postgres …

The tradeoff here is that Kafka will still work perfectly if one of its instances goes down. (Or you take it down, for upgrades, etc.)

Can you lose one Postgres instance?

AIUI Postgres has high-availability out of the box, so it's not a big deal to "lose" one as long as a secondary can take over.

Only replication is built-in, you need to add a cluster manager like Patroni to make it highly-available.

Definitely not in the case of Kafka. Even with SSD that would limit it to around 100kHz. Batch commit allows Kafka (and Postgres) to amortize fsync overhead over many messages.

On enterprise grade storage writes go to NVRAM buffers before being flushed to persistent storage so this isn't much of a bottleneck.

The context was somebody doing this on their laptop.

I was expanding the context

No, it's for every batch.

To the issue of complexity, is Redpanda suitable as a "single node implementation" where a Kafka cluster is not needed due to data volume, but the Kafka message bus pattern is desired?

AKA "Medium Data" ?

Yes. I’ve run projects where it was used that way.

It also scales to very large clusters.