There's a reason this is on my blog and not a paper in a journal. This isn't supposed to show the absolute speed of either tool, the benchmark is not set up for that. I do state that redis has more performance on the table in the blog post.

The main issue is that a reader might mistake Redis as a 2X faster postgres. Memory is 1000X faster than disk (SSD) and with network overhead Redis can still be 100X as fast as postgres for caching workloads.

Otherwise, the article does well to show that we can get a lot of baseline performance either way. Sometimes a cache is premature optimisation.

That's the reader's fault then. I see the blog post as the counter to the insane resume-building over-engineered architecture you see at a lot of non-tech companies. Oh, you need a cache for our 25-user internal web application? Let's put an front a redis cluster with elastisearch using an LLM to publish cache invalidation with Kafka.

There's also a sort of anti-everything attitude that gets boring and lazy. Redis is about the simplest thing possible to deploy. This wasn't about "a redis cluster with elastisearch using an LLM" it was just Redis.

I sometimes read this stuff like people explaining how they replaced their spoon and fork with a spork and measured only a 50% decrease in food eating performance. And have you heard of the people with a $20,000 Parisian cutlery set to eat McDonalds? I just can't understand insane fork enjoyers with their over-engineered their dining experience.

Software development has such a pro-complexity culture that, I think, we need more anti-stuff or pushback.

There is this cv-driven-development when you have to use Redis, Kafka, Mongo, Rabbit, Docker, AWS, job schelduers, Microservices, and so on.

The less dependencies my project has the better. If it is not needed why use it?

If your cache fits in Redis then it fits in RAM, if your cache fits in RAM then Postgres will serve it from RAM just as well.

Writes will go to RAM as well if you have synchronous=off.

Not necessarily true. If you're sharing the database with your transaction workload your cache will be paged out eventually.

This was my take as well, but I'm a MySQL / Redis shop. I really have no idea what tables MySQL has in RAM at any given moment, but with Redis I know what's in RAM.

> The main issue is that a reader might mistake Redis as a 2X faster postgres. Memory is 1000X faster than disk (SSD) and with network overhead Redis can still be 100X as fast as postgres for caching workloads.

Your comments suggest that you are definitely missing some key insights onto the topic.

If you, like the whole world, consume Redis through a network connection, it should be obvious to you that network is in fact the bottleneck.

Furthermore, using a RDBMS like Postgres may indeed imply storing data in a slower memory. However, you are ignoring the obvious fact that a service such as Postgres also has its own memory cache, and some query results can and are indeed fetched from RAM. Thus it's not like each and every single query forces a disk read.

And at the end of the day, what exactly is the performance tradeoff? And does it pay off to spend more on an in-memory cache like Redis to buy you the performance Delta?

That's why real world benchmarks like this one are important. They help people think through the problem and reassess their irrational beliefs. You may nitpick about setup and configuration and test patterns and choice of libraries. What you cannot refute are the real world numbers. You may argue they could be better if this and that, but the real world numbers are still there.

> If you, like the whole world, consume Redis through a network connection

I think "you are definitely missing some key insights onto the topic". The whole world is a lot bigger than your anecdotes.

> If you, like the whole world, consume Redis through a network connection, it should be obvious to you that network is in fact the bottleneck.

Not to be annoying - but... what?

I specifically _do not_ use Redis over a network. It's wildly fast. High volume data ingest use case - lots and lots of parallel queue workers. The database is over the network, Redis is local (socket). Yes, this means that each server running these workers has its own cache - that's fine, I'm using the cache for absolutely insane speed and I'm not caching huge objects of data. I don't persist it to disk, I don't care (well, it's not a big deal) if I lose the data - it'll rehydrate in such a case.

Try it some time, it's fun.

> And at the end of the day, what exactly is the performance tradeoff? And does it pay off to spend more on an in-memory cache like Redis to buy you the performance Delta?

Yes, yes it is.

> That's why real world benchmarks like this one are important.

That's not what this is though. Just about nobody who has a clue is using default configurations for things like PG or Redis.

> They help people think through the problem and reassess their irrational beliefs.

Ok but... um... you just stated that "the whole world" consumes redis through a network connection. (Which, IMO, is wrong tool for the job - sure it will work, but that's not where/how Redis shines)

> What you cannot refute are the real world numbers.

Where? This article is not that.

that is an interesting use case, I hadn't thought about a setup like this with a local redis cache before. Is it the typical advantages of using a db over a filesystem the reason to use redis instead of just reading from memory mapped files?

> Is it the typical advantages of using a db over a filesystem the reason to use redis instead of just reading from memory mapped files?

Eh - while surely not everyone has the benefits of doing so, I'm running Laravel and using Redis is just _really_ simple and easy. To do something via memory mapped files I'd have to implement quite a bit of stuff I don't want/need to (locking, serialization, ttl/expiration, etc).

Redis just works. Disable persistence, choose the eviction policy that fits the use, config for unix socket connection and you're _flying_.

My use case is generally data ingest of some sort where the processing workers (in my largest projects I'm talking about 50-80 concurrent processes chewing through tasks from a queue (also backed by redis) and are likely to end up running the same queries against the database (mysql) to get 'parent' records (ie: user associated with object by username, post by slug, etc) and there's no way to know if there will be multiples (ie: if we're processing 100k objects there might be 1 from UserA or there might be 5000 by UserA - where each one processing will need the object/record of UserA). This project in particular there's ~40 million of these 'user' records and hundreds of millions of related objects - so can't store/cache _all_ users locally - but sure would benefit from not querying for the same record 5000 times in a 10 second period.

For the most part, when caching these records over the network, the performance benefits were negligible (depending on the table) compared to just querying myqsl for them. They are just `select where id/slug =` queries. But when you lose that little bit of network latency and you can make _dozens_ of these calls to the cache in the time it would take to make a single networked call... it adds up real quick.

PHP has direct memory "shared memory" but again, it would require handling/implementing a bunch of stuff I just don't want to be responsible for - especially when it's so easy and performant to lean on Redis over a unix socket. If I needed to go faster than this I'd find another language and likely do something direct-to-memory style.

I find your article valuable. It shows me what amount of configuration is needed for a reasonable expectation of performance. In real world, I’m not going to spend effort maxing out configuring a single piece of tool. Not being the most performing config on either of the tools is the least of my concern. Picking either of them, or as you suggested, Postgres, and then worry about getting one billion requests to the service is far more important

Thank you for the article.

My own conclusions from your data:

- Under light workloads, you can get away with Postgres. 7k RPS is fine for a lot of stuff.

- Introducing Redis into the mix has to be carefully weighted against increased architectural complexity, and having a common interface allows us to change that decision down the road.

Yeah maybe that's not up to someone else's idea of a good synthetic benchmark. Do your load-testing against actual usage scenarios - spinning up an HTTP server to serve traffic is a step in the right direction. Kudos.

[deleted]

It's not a paper or a journal but you could at least try to run a decent benchmark. As it is this serves no purpose other than reinforcing whatever point you started with. Didn't even tweak postgres buffers, literally what's the point.

I still end up recommending using postgres though, don't I?

"I'll use postgres" was going to be your conclusion no matter what I guess?

I mean what if an actual benchmark showed Redis is 100X as fast as postgres for a certain use case? What are the constraints you might be operating with? What are the characteristics of your workload? What are your budgetary constraints?

Why not just write a blog post saying "Unoptimized postgres vs redis for the lazy, running virtualized with a bottleneck at the networking level"

I even think that blog post would be interesting, and might be useful to someone choosing a stack for a proof of concept. For someone who to scale to large production workloads (~10,000 requests/second or more), this isn't a very useful article, so the criticism is fair, and I'm not sure why you're dismissing it off hand.

> "I'll use postgres" was going to be your conclusion no matter what I guess?

Would it bother you as well if the conclusion was rephrased as "based on my observations, I see no point in rearchitecting the system to improve the performance by this much"?

I think you are too tied to a template solution that not only you don't stop to think why you're using it or even if it is justified at all. Then, when you are faced with observations that challenge your unfounded beliefs, you somehow opt to get defensive? That's not right.

I completely agree that this is not relevant for anyone running such workloads, the article is not aimed at them at all.

Within the constraints of my setup, postgres came out slower but still fast enough. I don't think I can quantify what fast enough is though. Is it 1000 req/s? Is it 200? It all depends on what you're doing with it. For many of my hobby projects which see tens of requests per second it definitely is fast enough.

You could argue that caching is indeed redundant in such cases, but some of those have quite a lot of data that takes a while to query.

That's the point, you put no effort and decided to do what you had decided already to do before.

I don't think this is a fair assessment. Had my benchmarks shown, say, that postgres crumbled under heavy write load then the conclusion would be different. That's exactly why I decided to do this - to see what the difference was.

Of course you didn't see postgres crumble. This still a toy example of a benchmark. Nobody starts (and even more pays for) a postgres instance to use exclusively as a cache. It is guaranteed that even in the simplest of deployments some other app (if not many of them) will be the main postgres tenant.

Add an app that actually uses postgres as a database, you will probably see its performance crumble, as the app will content the cache for resources.

Nobody asked for benchmarking as rigorous as you would have in a published paper. But toy examples are toy examples, be it in a publication or not.

[flagged]

That can't have felt great having your tantrum spotlighted by the author.