I see that the author took a 'heuristical' approach for retrying tasks (having a predetermined amount of time a task is expected to take, and consider it failed if it wasn't updated in time) and uses SQS. If the solution is homemade anyway, I can only recommend leveraging your database's transactionality for this, which is a common pattern I have often seen recommend and also successfully used myself:

- At processing start, update the schedule entry to 'executing', then open a new transansaction and lock it, while skipping already locked tasks (`SELECT FOR UPDATE ... SKIP LOCKED`).

- At the end of processing, set it to 'COMPLETED' and commit. This also releases the lock.

This has the following nice characteristics:

- You can have parallel processors polling tasks directly from the database without another queueing mechanism like SQS, and have no risk of them picking the same task.

- If you find an unlocked task in 'executing', you know the processor died for sure. No heuristic needed

This introduces long-running transactions, which at least in Postgres should be avoided.

Depends what else you’re running on it; it’s a little expensive, but not prohibitively so.

Long running transactions interfere with vacuuming and increase contention for locks. Everything depends on your workload but a long running transactions holding an important lock is an easy way to bring down production.

If the system is already using SQS, DynamoDB has this locking library which is lighter weight for this use case

https://github.com/awslabs/amazon-dynamodb-lock-client

> The AmazonDynamoDBLockClient is a general purpose distributed locking library built on top of DynamoDB. It supports both coarse-grained and fine-grained locking.

I read too many "use Postgres as your queue (pgkitchensink is in beta)", now I'm learning listen/notify is a strain, and so are long transactions. Is there a happy medium?

Just stop worrying and use it. If and when you actually bump into the limitations, then it's time to sit down and think and find a supplement or replacement for the offending part.

Excellent advice across many domains/techs here.

t1: select for update where status=pending, set status=processing

t2: update, set status=completed|error

these are two independent, very short transactions? or am i misunderstanding something here?

--

edit:

i think i'm not seeing what the 'transaction at start of processor' logic is; i'm thinking more of a polling logic

    while true:
      r := select for update
      if r is None:
        return
      sleep a bit
this obviously has the drawback of knowing how long to sleep for; and tasks not getting "instantly" picked up, but eh, tradeoffs.

Your version makes sense. I understood the OP's approach as being different.

Two (very, if indexed properly) short transactions at start and end are a good solution. One caveat is that the worker can die after t1, but before t2 - hence jobs need a timeout concept and should be idempotent for safe retrying.

This gets you "at least once" processing.

> this obviously has the drawback of knowing how long to sleep for; and tasks not getting "instantly" picked up, but eh, tradeoffs.

Right. I've had success with exponential backoff sleep. In a busy system, means sleeps remain either 0 or very short.

Another solution is Postgres LISTEN/NOTIFY: workers listen for events and PG wakes them up. On the happy path, this gets instant job pickup. This should be allowed to fail open and understood as a happy path optimization.

As delivery can fail, this gets you "at most once" processing (which is why this approach by itself it not enough to drive a persistent job queue).

A caveat with LISTEN/NOTIFY is that it doesn't scale due to locking [1].

[1]: https://www.recall.ai/blog/postgres-listen-notify-does-not-s...

What are you thoughts on using Redis Streams or using a table instead of LISTEN/NOTIFY (either a table per topic or a table with a compound primary key that includes a topic - possibly a temporary table)?

I've not used Redis Streams, but it might work. I've seen folks advise against PG, in favor of Redis for job queues.

> using a table instead of LISTEN/NOTIFY

What do you mean? The job queue is backed by a PG table. You could optionally layer LISTEN/NOTIFY on top.

I've had success with a table with compound, even natural primary keys, yes. Think "(topic, user_id)". The idea is to allow for PARTITION BY should the physical tables become prohibitively large. The downsides of PARTITION BY don't apply for this use case, the upsides do (in theory - I've not actually executed on this bit!).

Per "topic", there's a set of workers which can run under different settings (e.g. number of workers to allow horizontal scaling - under k8s, this can be automatic via HorizontalPodAutoscaler and dispatching on queue depth!).

They're proposing doing it in one transaction as a heartbeat.

> - If you find an unlocked task in 'executing', you know the processor died for sure. No heuristic needed

Yes, and that cannot work: if a task is unlocked but in 'executing' state, how was it unlocked but its state not updated?

If a worker/processor dies abruptly, it will neither unlock nor set the state appropriately. It won't have the opportunity. Conceptually, this failure mode can always occur (think, power loss).

If such a disruption happened, yet you later find tasks unlocked, they must have been unlocked by another system. Perhaps Postgres itself, with a killer daemon to kill long-running transactions/locks. At which point we are back to square one: the job scheduling should be robust against this in the first place.

This is exactly what we're doing. Works like a charm.

Don't have to keep transaction open. What I do is:

1. Select next job

2. Update status to executing where jobId = thatJob and status is pending

3. If previous affected 0 rows, you didn't get the job, go back to select next job

If you have "time to select" <<< "time to do" this works great. But if you have closer relationship you can see how this is mostly going to have contention and you shouldn't do it.