Threads and Processes are the most common mechanism for running background processing. In this post, I would like to introduce message queues as a healthy alternative to running background tasks. This scheme has proven to be a lot more resilient and scalable while allowing a fine grained control over resource consumption.
What are message queues
If you are not familiar with message queues, I highly recommend reading this well written introductory article by IBM Cloud Education team. The following excerpt describes the basic concept:
Message queues use a point-to-point messaging pattern, in which one application (called the sender) submits a message to the queue and another application (called the receiver) gets the message from the queue and consumes it.https://www.ibm.com/cloud/learn/message-queues
Rabbit MQ and Amazon SQS are popular point-to-point message queue systems that can handle thousands of messages with ease without requiring a lot of coding. There are also high performance pub/sub structured message queues such as Apache Kafka and Amazon Kinesis that can scale into millions of messages. For today’s post I will be focusing on point-to-point message queuing systems.
Background Processing with Message Queues
Because of the asynchronous nature of message processing; we can use messages to call into different parts of the application code. This is done by putting a message on the queue and running a copy of our application in a “worker” mode, where the function name and arguments are part of the message. All the worker needs to do is call the function with the given arguments.
This kind of message queues are often called “job-queues” or “worker-queues”. When coded appropriately, we can easily switch between async vs sync calls by choosing between posting a message or calling the function directly.
Let’s discuss some of the benefits of using such a system compared to using Threads or Processes.
One of the main benefits of such systems, over threads and secondary processes to handle background jobs, is that it is hard to recover from thread or process failure. Message Queues come with built in recovery schemes where a failed processing of the message can be configured to put the message back on queue where another worker will pick it up.
Fine Grained Control
With a message queue system, we may be able to scale the worker processes up or down independently of the main application. We can also run the workers on the same or different hardware as defined by the CPU and memory requirements of the jobs. We may also scale the workers up and down and therefore, churn through the jobs faster or slower as the need arrives. This is much harder and complicated to achieve through Threads and Processes.
Some of the worker queue system allow for scheduling the messages. This is achieved by putting the message in a holding pattern until the scheduled time, at which point the message is made available to the workers like any other message who process it just like any other messages.
Occasionally we might run into a situation where the queue is overwhelmed and it is taking too long to process. Scaling up the workers may not be the best idea given that only few of the messages need to be processed quickly and the rest of them are not urgent. This can be solved by setting up a “low” and “high” priority queue and have the workers go through the high priority messages first.
While background worker queues may be helpful in elevating some of the common challenges when using Threads and Processes they are not without challenge.
For one, they don’t really help multi-threading issues, we still must use locks and semaphores and must deal with race conditions or similar issues; such as lock starvation and deadlocks. This article may be a good refresher if you need one.
Second, depending on our setup, critical hardware and software failures can results in lost messages. Most popular message brokers (as they are called) support high availability mechanisms to deal with such catastrophic failures.
However, we can still overflow the system by enqueuing messages faster than the workers can process them. This interesting read from Slack Engineering will help identify this problem and some solutions they used to solve this problem.
Further Reading and Code Examples
Message Queues are a great way to run background processes with reliability and scale. Here is a post with code sample on how to use RabbitMQ for background processing. Also, checkout Resque, PHP-Resque, Celery (and example) and even an Amazon Lambda based Solution to use it for background job processing.
You may consider reading Scaling To Millions for some more ideas of improving performance of applications.