From what I recall, Reddit uses AWS extensively. Could they not have replaced RabbitMQ with SQS? You get the near unlimited horizontal scalability, extremely good uptime, guaranteed at least once message delivery and for the case of a worker crash, the messages will become visible again after the visibility timeout (since they wouldn’t have been deleted by the worker).
SQS could not handle the volume or latency requirements we had, and it was too expensive compared to running it ourselves at the time.
I think there is a hard limit on the number of in-flight requests (that is items that have been dequeued by a worker, but whose job has not been completed). I wouldn't be surprised if Reddit hit those sorts of volumes.