> if you have to check an identifier for revocation on every request you could just use an opaque session ID and look that up on every request instead!
One reason could be the size. A revocation list only needs to keep session IDs of recently logged-out sessions, for which the token's TTL hasn't yet expired. It may be a much smaller list than a list of every active session.
Also, a JWT (or a Macaroon, etc) can store a large amount of details about the session in a cryptographically secure, unforgeable way. This rids you of the necessity to store all that in your active session database, again cutting the size.
As someone who operates a PostgreSQL database containing 27 billion SSL certificates, each 1-2kb each, with a bunch of secondary indexes that get inserted in random order, I find it pretty incredible that people see the need to optimize their session database. At what scale does the size of the session database actually matter?
Those stateless tokens may be "unforgeable", but they are replayable, and if you're not mindful of that you can have security vulnerabilities.
I think one meaningful case is when you have services in very different locations and you would rather than having to make a request to a session store in a single location, replicate the data to each location for better latency, so in this case a revocation list.
We have a single harbor instance at work, backed by PG. The folks ran into some sort of performance issue during their testing (not documented) and their solution was to use cloud native pg, run two postgres, and add additional second layer of connection pooling.
We eventually hit a bug in a combination of certain images, docker desktop (not even supported by us), and permissions. The error looked related to the connections, so I suggested the pooler be bypassed as a test. They ignored the suggestion for two weeks until it was the last thing to test.
Sure enough bypassing the pooler helped stabilize the connection from harbor during image uploads.
Now I need to talk to them about properly vertical scaling PG before we try and run two with replication, in the same kube cluster, in the same physical data center.
You should do some basic optimizations. Fixed length table and indexes on the unique string for fast lookups. I also like to do a rolling delete for old sessions after 30 days unless mobile session that is logged in. Those get to live forever.
Fair enough, but those optimizations are basically free. People think stateless tokens are free but they really are not.
The cost of the stateless token is basically the CPU usage for signing the message and checking the signature with the public key on the client. Example: Google Compute Instance asks metadata server for OIDC token (which is a JWT). The metadata server respond with the token that basically says "here's the machine service account, here's the machines ID, this token is proof that I am service account abc123 and it's valid for 20 seconds". This is one of the most common uses of JWTs in enterprise. You don't store them. They actually are free.
Lots of web devs get tricked into using them as primary session tokens and it's a huge anti pattern. I see it all the time and people get aggressive about it.
The cost is the vigilance required to use them safely. It's not just compute/storage costs.
I didn't downvote you. You're absolutely right. Implementation of anything is work.
> Fair enough, but those optimizations are basically free. People think stateless tokens are free but they really are not.
Strawman.
The only requirement for a JWT is posting the JSON Web Key set with the public keys used to verify the JWTs signature. That's the full cost of a no-frills JWT implementation of you exclude IAM.
If you want to have one-time JWTs you need to maintain a revocation list. This is literally a set of IDs. If you go nuts and use GUIs for JWT IDs that means each entry takes as much space as 4 ints, and all you need is a set membership check on said integer. Even at FANG scale you can handle that scale in a memory cache service such as ValKey running on a COTS desktop.
Now show us your alternative.
Yes we have heard this before, React is only 30kb! But that misses the enormous amount of infra you need to even just do a basic fetch. (Read the post by the React Query author on whether you need React Query or not)
Likewise with JWTs for sessions you need to handle cache invalidation, revocation lists, key rotation, the list of difficult comp sci problems really does go on!
The same issue as always plaguing the frontend world. Up front “simplicity”, enormous actual complexity
> Yes we have heard this before, React is only 30kb!
Not quite. You might be surprised to know, but the whole JOSE standard, and JWT in particular, specify a very limited set of fields. Whenever anyone starts requiring more than that, the responsibilities start to be offloaded to the likes of OpenID Connect.
This is actually really funny because I recently had this problem having to authenticate an internal repo with an OIDC but the script had to run so early in the bootstrap processes that the python google sdk is not yet installed so I had to manually install the SDK before apt is available by pulling it down manually to bootstrap the chicken or the egg problem. My initial implementation was using curl but folks insisted (rightly) on using the official SDK. I'm sure it's a lot more than 30kb though not during runtime per say.
> If you want to have one-time JWTs you need to maintain a revocation list.
No, you always need a revocation list if you want to handle user sessions in a secure manner. What claims do your tokens contain? If it's anything other than some stable identifiers, like user name, email, permissions, etc. then you now have a cache invalidation problem.
But if all your token carries is an identifier which you need to look up, how's this any better than a signed cookie containing the session ID? All you've done is add complexity.
What do you do about availability? AFAIK the choice is to pick one of 3 or 4 hacky difficult-to-administer clustering solutions, or have that single PostgreSQL database be a SPOF for your whole system.
27 billion? Do you work at AWS?
The issue isn't size, it's load.
I am still waiting for Macaroons to be used widely. I think they are a fantastic invention.
It seems they were not of very much use in the past, but with the agentic-everything now, I see this as a great way of delegating permissions to subagents, third-party agents, etc.
Working on something along these lines but unfortunately I cannot dedicate as much time as I'd like.
Still, if anyone is reading, give Macaroons a try!
We have what I believe to be one of the world's largest deployments of Macaroons. They're a mixed bag, though I think they're a lot more interesting in a world where agents do most of the fiddly work.
https://fly.io/blog/operationalizing-macaroons/
I am very aware of your work!
It's the only prod usage of Macaroons I know of, I think.
Third-party discharge seems like a great way to have human-in-the-loop gating, among other interesting things.
Would be great reading your thoughts if you ever write about the agentic use case, having all the fly.io experience
I like the raspberry ones. Or lemon is also good
Are you thinking of macarons? Macaroons are coconut.
JWTs can do that (delegate) and such capability is already well defined.
Maybe I stated it wrong. Macaroons have the ability to attenuate the restrictions _without_ contacting the auth server, which makes it IMO fit for restricting and attenuating as much as you want, without much cost.
If I need a roundtrip to the auth server to attenuate, I am not necessarily going to do it as often.
Most token formats delegate. Macaroons support attenuation, confinement, and embedded third-party claims, none of which are JWT capabilities.
Another point is that managing session data on the server-side is a pain... If your app server goes down, stale session data would be left behind in your session store; it can easily become orphaned... So you need to set an expiry on it to ensure that it will be cleaned up no matter what... But you need to keep extending the expiry while the user is still online. God forbid you create the session data before you set the expiry on it and the operation that sets the expiry fails (e.g. server crashes at the precise moment or some error occurs which causes it to be skipped)... In practice, it's hard to avoid stale/orphaned session data.
And yes, you need to store and manage more data and your session store is an additional Single Point of Failure... With JWT the revocation list is an optional... Your system can keep running without it; it just won't be able to ban users. It's a cleaner separation of concerns without SPoF.
JWTs have so many benefits over session IDs, I could write a book about all the benefits. Sure, there are some tradeoffs but the negatives are typically pretty minor or hand-wavy.