Having the ability to set a TTL on the cache key is a critical feature of a cache, not something that can be tacked on later.
I always find these "don't use redis" posts kind of strange. Redis is so simple to operate at any scale, I don't quite get why it is important to remove it.
I keep hoping Postgres will one day have the ability to mark a timestamp column as an expiry column. It would be useful for all kinds of things beyond caching, including session tokens, feature flags, background jobs, rate limiting, delayed deletion (as a variant of soft deletion), etc.
It seems like the autovacuum could take care of these expired rows during its periodic vacuum. The query planner could automatically add a condition that excludes any expired rows, preventing expired rows from being visible before autovacuum cleans them up.
One could use a trigger for this. All we need is to setup a trigger that would delete all expired records looking at some timestamp column on update. That would eat up some latency but as was said, most projects would find it good enough anyway.
Probably better to use partitioned table and drop old partitions.
I use pg cron for this. But I don't have a need for TTL to be to the minute accurate, or even to the hour.
It's not like it's bad, it's more like cutting down on the amount of systems you need to operate.
I have been running Redis for years as a cache and have spent less than 5 cumulative minutes "operating" it.
I'm a big "just use Postgres" fan but I think Redis is sufficiently simple and orthogonal to include in the stack.
Postgres nationalists will applaud the conclusion no matter how bad the reasoning is.
Don't get me wrong, the idea that he wants to just use a RDMBS because his needs aren't great enough, is a perfectly inoffensive conclusion. The path that led him there is very unpersuasive.
It's also dangerous. Ultimately the author is willing to do a bit more work rather than learn something new. This works because he's using a popular tool people like. But overall, he doesn't demonstrate he's even thought about any of the things I'd consider most important; he just sort of assumes running a Redis is going to be hard and he'd rather not mess with it.
To me, the real question is just cost vs. how much load the DB can even take. My most important Redis cluster basically exists to take load off the DB, which takes high load even by simple queries. Using the DB as a cache only works if your issue is expensive queries.
I think there's an appeal that this guy reaches the conclusion someone wants to hear, and it's not an unreasonable conclusion, but it creates the illusion the reasoning he used to get there was solid.
I mean, if you take the same logic, cross out the word Postgres, and write in "Elasticsearch," and now it's an article about a guy who wants to cache in Elasticsearch because it's good enough, and he uses the exact same arguments about how he'll just write some jobs to handle expiry--is this still sounding like solid, reasonable logic? No it's crazy.
> Having the ability to set a TTL on the cache key is a critical feature of a cache, not something that can be tacked on later.
What exactly is the challenge you're seeing? In the very least, you can save an expiry timestamp as part of the db entry. Your typical caching strategy already involves revalidating cache before it expires, and it's not as if returning stale while revalidating is something completely unheard of.
Yeah, the article was like "I always need a DB anyway" but then sets up an extra cronjob to expire keys, plus more code. I get YAGNI and avoiding deps, but this is really extra stuff to deal with.
Maybe Postgres could use a caching feature. Until then, I'm gonna drop in Redis or memcached instead of reinventing the wheel.
Expiring keys in Postgres with a created_at column and a pg_cron job is very easy (at least, if you're comfortable in Postgres). Redis is world class though of course, and can be deployed turn-key in basically any environment. If you're more comfortable in Redis than Postgres, more power to you. Different choices can be pragmatic to different people.
Personally for a greenfield project, my thinking would be that I am paying for Postgres already. So I would want to avoid paying for Redis too. My Postgres database is likely to be underutilized until (and unless) I get any real scale. So adding caching to it is free in terms of dollars.
I'm comfy with Postgres though, like I'll center my entire backend around it and do the heavy lifting in SQL (never ORM). It's more that I don't want to depend on a cronjob for something as fundamental as a cache.
Usually Postgres costs a lot more than Redis if you're paying for a platform. Like a decent Redis or memcached in Heroku is free. And I don't want to waste precious Postgres connections or risk bogging down the whole DB if there's lots of cache usage, which actually happened last time I tried skipping Redis.
I can understand being nervous about some cron job running on some other service, but what's concerning about a cron job managed inside of Postgres with pg_cron? If that doesn't run, your database is probably down anyway.
Postgres might cost more but I'm probably already paying. I agree that exhausting connections and writing at a high rate are easy ways to bring down Postgres, but I'm personally not going to worry about exhausting connections to Postgres until I have at least a thousand of them. Everything has to be considered within the actual problem you are solving, there are definitely situations to start out with a cache.
I might be ok with this if it were built in, but pg_cron is an extension, so first off you might not even have access to it. And then you still have to monitor it.
Seems like it's available on all the major providers.
Heroku doesn't have it. That's actually kinda annoying that they don't, cause the others do. AND they no longer have free Redis, so that changes things a bit.
Edit: well a tiny bit, max $3/mo
> Usually Postgres costs a lot more than Redis if you're paying for a platform.
You need to back up your unbelievable assertion with facts. Memory cache is typically far more expensive than a simple database, specially as provisioning the same memory capacity as RAM is orders of magnitude more expensive than storing the equivalent data in a database.
Why would you cache the entire database though? Seems like an apples to oranges comparison.
> Why would you cache the entire database though?
I have no ideas where did you got that from.
> specially as provisioning the same memory capacity as RAM is orders of magnitude more expensive than storing the equivalent data in a database.
I'm not sure how else to interpret this
Why are you confusing memory capacity as a requirement to store the whole database in memory? I mean, think. What do you think is the biggest performance bottleneck with caches, and how does this relate to memory capacity?
I didn't say it's cheaper for the same cache size. But yeah a base tier Redis that will carry a small project tends to be a lot cheaper than the base tier Postgres.
> I didn't say it's cheaper for the same cache size.
So be specific. What exactly did you wanted to say?
> But yeah a base tier Redis that will carry a small project tends to be a lot cheaper than the base tier Postgres.
This is patently false. I mean,some cloud providers offer nosql databases with sub-20ms performance as part of their free tier.
Just go ahead and provide any evidence, any at all,that support the idea that Redis is cheaper than Postgres. Any concrete data will do.
Look at the Heroku pricing. If you don't like Heroku then look at AWS pricing. Specifically for Postgres, not a NoSQL DB (which Redis can be too)
> Yeah, the article was like "I always need a DB anyway" but then sets up an extra cronjob to expire keys, plus more code.
You do not need cron jobs to do cache. Sometimes you don't even need a TTL. All you need is a way to save data in a way that is easy and cheaper to retrieve. I feel these comments just misinterpret what a cache is by confusing it with what some specific implementation does. Perhaps that's why we see expensive and convoluted strategies using Redis and the like when they are absolutely not needed at all.
If we don't use a TTL, aren't we going to have to either accept that our cache will grow without bounds or take an even more sophisticated approach (like tracking access times instead of creation times)? Is there something simpler I'm not seeing?
> If we don't use a TTL, aren't we going to have to either accept that our cache will grow without bounds (...)
Do you have a bound? I mean, with Redis you do, but that's primarily a cost-driven bound.
Nevertheless, I think you're confusing the point of a TTL. TTLs are not used to limit how much data you cache. The whole point of a TTL is to be able to tell whether a cache entry is still fresh or it is stale and must be revalidated. Just because some cache strategies use TTL to determine what entry they should evict, that is just a scenario that takes place when memory is at full capacity.
A TTL doesn't really tell you if it's stale though. It gives you an upper bound on how long it can have been stale. But something becomes stale when the underlying resource is written to, which can happen an hour or an instant after you cache it. You should probably evict it when the write comes in. In my mind, it's for evicting things that aren't in use (to free up memory).
> A TTL doesn't really tell you if it's stale though (...)
Non-sequitur,and imaterial to the discussion.
> You should probably evict it when the write comes in.
No. This is only required if memory is maxed out and there is no more room to cache your entry. Otherwise you are risking cache misses by evicting entries that are still relatively hot.
The cache isn't the only hot thing here. Relax.
> Non-sequitur,and imaterial to the discussion.
You said:
> The whole point of a TTL is to be able to tell whether a cache entry is still fresh or it is stale and must be revalidated.
So I responded to it. I don't really understand why you think that's nonsequiter.
> No.
I'm a bit confused. We're not using TTLs and we're not evicting things when they become invalid. What is your suggestion?
I've tried it this way. You can get away with no TTL if your keys are constrained. Sometimes there are enough keys to be a problem. I'd rather just set up a TTL and not worry about this.
Agreed, simple and widely applicable heuristics are great, and you can think deeply on it when and if it proves to be an issue worthy of such attention.