Easy Retries with RabbitMQ

Overview

Here’s the problem…you’re using RabbitMQ to solve some business problem. Basically, a message is put on an exchange, then routed to one or more queues and some other application picks the message off the queue and does something with it. But what happens if the application consuming the messages from the queue fails? Sometimes you’ll want to just move onto the next thing (maybe even log an error if you’re nice). Other times you’re going to want to retry. However, this is where the problem is…

Some background on Rabbit MQ.  Since version 2.7.0 rabbit mq tries to guarantee message ordering.  This means that RMQ does it’s best to maintain the order of messages in the queue.  This even applies to messages which are requeued after reading it for the first time.  This can lead to consumers bogged down or stuck on the failed message which can also lead to heavy load on the consumers processing the message or any of the downstream systems these consumers rely on.

First Approach

The first time I tried to do retries I did the following:

  1. Categorize the types of errors that were retryable and which were permanent
  2. Implement a way in my application to determine if an error I encountered is retryable or not.
  3. I implement a function which I can call when I encounter a retryable error event.  This is typically done by calling the reject function on a queue and passing it the the requeue flag set to true.  See https://www.rabbitmq.com/nack.html for more details.

This approach is flawed because of a couple reasons:

  1. When a message is requeued, the message is put on the head of the queue.  Which means it will be immediately picked up by a worker process.  This is undesirable because you could get stuck in a tight loop of retrying the same set of messages and you’ll never move on to newer messages which may be successful.
  2. If I mis-categorized a error as retryable when it was not, then I will retry indefinitely.  If this happens to enough message, then the messages stuck in the retry loop will block any new message from getting though.

A Better Way

I considered a few options when trying to find solutions to the previously mentioned flaws with my original approach for retries.  I looked into acking the message and republishing to the same queue/exchange.  I also added a retry counter to my message which I increment, giving me the ability to give up after n number of retries.  This is better, because I’m putting the message I’m retrying on the back of the queue and I now give up after a certain number of retries.  But there is still a possibility of retries happening in a tight loop, which is undesirable.

I want to be able to say “retry this message in 30 seconds”.  Or even have a expodential backoff so that the time between retries grows as I get closer to my max retry count.

One option for implementing this is in code.  But that will need to be done in each of my consumers and is prone to errors.  Rabbit MQ actually has a nice feature which can do this for you.  It’s called Dead Letter Exchanges (https://www.rabbitmq.com/dlx.html).  Here is an overview of how this new approach works:

  1. A message is published to an exchange is routed to a queue
  2. Consumer reads message off the queue
  3. Retryable error is caught
  4. Ack original message
  5. Update original message with a retry count of +1
  6. Publish message to Dead Letter Exchange (instead of your normal exchange) with the same routing key that was received originally.  This will put the message in a retry_message queue
  7. Message will be republished based on how you setup the Dead Letter Exchange

Setting Up the Dead Letter Exchange

So, here is what I do to get things working:

  1. Create the dead letter exchange, which is just a normal exchange with a special name
  2. Create a retry_message queue and have all messages published to the dead letter exchange route here
  3. When you setup the retry_message queue, be sure to default the following parameter values of the queue
    1. x-message-ttl: 30000 – This will set a ttl on any message published to the queue.  When the ttl expires, the message will be republished to the exchange specified in the `x-dead-letter-exchange` parameter.
    2. x-dead-letter-exchange: original_exchange_name – This is where the message will get republished to once the message ttl expires.  We normally want this be the name of the exchange where the message was originally published.

The result of the above configuration means that any message sent to the dead letter exchange will wait in the retry_message queue for 30 seconds before being put back on the tail of the original queue for reprocessing.

5 thoughts on “Easy Retries with RabbitMQ

    • I’m using the consumer to check a timestamp I set in the message to see if the message is ready for consumption. May not be the most efficient way but it gives me predictable redelivery times. I experimented with DLX feature for this, but ran into the TTL message blocking at the head of the queue. I tried setting priorities but that was no good, as messages with higher priorities (lower attempts) would get added to near the front of the queue and their TTL would block the expired low priority messages. Also tried adding a max-length to the queue but that was no good either, messages that were not yet expired, ended up being delivered.

      So I thought about it and I think what would work is to have a DLX & DL queue per each retry attempt. So attempt 1 would have the lowest back-off, would go to its own DLX (dlx_attempt_1) where say the expiration is 60 seconds. The next would be dlx_attempt_2, expiration 120 seconds, then dlx_attempt_3 expiration 240 seconds etc.. The messages added to the back of the queue would have their expiration always greater than the messages ahead of them. But this leads to n DLX/DL queues where n is the number of max retries you want to do. Not sure if its worth it over the consumer just checking the message timestamp and re-queuing it if necessary.

      • I considered creating a new queue for every different expiration time, effectively turning them into FIFOs, but that seemed too messy if you allow any arbitrary expiration. It would probably be a fine solution if you have a limited number of possible expirations.

        In the end I discovered https://github.com/rabbitmq/rabbitmq-delayed-message-exchange, which aims to add exactly the functionality we need. I decided to keep using RabbitMQ to count the delivery attempts though, so my delayed message exchange publishes (after the delay) to a queue with maxlen=0, so messages are immediately deadlettered. From there on the logic is as described in this article.

  1. We prefer to use a simpler approach. In case of error we stop the current consumer’s thread by means of a thread sleep, and after the pause we nack and requeue the message that will be reprocessed

  2. I have added three retry queues to create exponential backoff and deadLetterExchange for all the three queues are same. So, if message is unacknowledged, message is sent to retry_queue based on retry count. Original message is acknowledged and new message is generated from deadLetterExhangeQueue having retry_count incremented. So, after 3 unsuccessful attempts, messages are sent to new_queue holding all the dead messages.

Leave a comment