Scaling Heavy BullMQ Workloads with Dragonfly Cloud

Introduction

Dragonfly is a modern, high-performance in-memory data store designed as a drop-in replacement for Redis. Its multi-threaded shared-nothing architecture and optimized data structures make it an ideal choice for handling high-throughput and low-latency workloads.

BullMQ, on the other hand, is a popular background job processing library for Node.js that provides robust task scheduling capabilities and is often used in conjunction with Redis for its backend storage. Dragonfly is also an officially supported backend storage option for BullMQ, allowing developers to leverage its enhanced performance and scalability for their message queuing needs.

In our previous blog posts, we covered how to run BullMQ with a Dragonfly instance in action (with benchmark results) and detailed the journey both Dragonfly and BullMQ teams undertook together to achieve 30x throughput. In this post, we will focus on strategies for scaling heavy BullMQ workloads with our managed cloud offering, Dragonfly Cloud.

Recap: Hashtags in Queue Names

In Redis and Dragonfly, a hashtag is a mechanism used to ensure that related keys are stored in the same shard in a clustered environment. This is achieved by using curly braces {} in the key name, allowing all keys with the same hashtag to be grouped together. For instance, if a key name contains a hashtag that looks like bullmq:{email-group-001}:email-queue-001, the hashtag portion {email-group-001} is used (instead of the whole key) in a hash function to determine the shard where the key should live.

When using BullMQ, it's a good practice to prefix your queue names to avoid key collisions from different applications and to include hashtags in your queue names to ensure that related queues are managed by the same Dragonfly thread. On Dragonfly's side, enabling optimal performance requires specific configuration. You need to start Dragonfly with the following server flags:

$> ./dragonfly --cluster_mode=emulated --lock_on_hashtags

These settings, combined with using hashtags in your queue names, ensure the best possible performance for your BullMQ workloads running on Dragonfly.

Using Dragonfly Cloud

If you're using Dragonfly Cloud, optimizing your BullMQ workloads becomes significantly easier. While setting server flags like --cluster_mode=emulated and --lock_on_hashtags on a self-managed Dragonfly instance is straightforward, it still requires setup and ongoing management to a certain extent.

With Dragonfly Cloud, however, you can skip this step entirely. All you need to do is ensure that the "BullMQ" checkbox under the "Specializations" section is checked when configuring your instance, as shown below:

Dragonfly Cloud BullMQ Specialization

Creating a Queue and Adding Jobs in BullMQ

To get started with BullMQ using Dragonfly Cloud as the backend storage, the first thing needed is to establish a connection and define a queue. In the example, we create a queue and add a job to it with some configurable options.

import IORedis from "ioredis";
import { Queue } from "bullmq";

// Connect to Dragonfly Cloud using the provided URL,
// which contains the necessary credentials.
// Note that Dragonfly uses the same wire protocol as Redis.
const connection = new IORedis(
  "redis://default:XXXXX.dragonflydb.cloud:6385"
);

// Create a queue named 'bullmq:{email-group-001}:email-queue-001'.
// A hashtag is used for optimized key distribution and performance.
const queue = new Queue(
  "email-queue-001",
  {
    connection: connection,
    prefix: "bullmq:{email-group-001}",
  }
);

// Add a job to the queue with options for delay, priority, etc.
await queue.add(
  // Job name.
  "send-email",
  // Job data.
  {
    recipient: "joe@test.com",
    subject: "Hello from BullMQ & Dragonfly Cloud!",
  },
  // Job configuration options.
  {
    delay: 100,  // Delay in milliseconds.
    priority: 1, // Priority from 1 (highest) to 2,097,152 (lowest).
    removeOnComplete: {
      age: 60,   // Lazily remove the job after 60 seconds.
      count: 10, // Keep at most 10 completed jobs.
    },
  }
);

There are a few key points to note in the code snippet above:

We start by connecting to Dragonfly Cloud using the IORedis library. Make sure the correct URL and credentials are provided.
A queue (i.e., the producer) is created with a connection to the Dragonfly Cloud instance. Within the queue name, a hashtag is used to group related queues together and ensure optimal performance.
A job is added to the queue with various options:
- delay specifies the delay before the job is processed.
- priority determines the job's execution order relative to other jobs.
- removeOnComplete controls the automatic removal of completed jobs.

It's important to pay attention to the removeOnComplete option, as it controls how long completed job data is retained in Dragonfly. This option allows you to specify both the age and the maximum count of completed jobs to keep. Additionally, BullMQ supports the removeOnFail option as well, which determines how failed jobs are handled in terms of retention. A good practice is to keep a handful of completed jobs and a much larger value of failed jobs. These configurations are crucial for controlling how much job data is stored in Dragonfly, allowing you to tailor your setup to your business requirements while balancing storage consumption. Choosing the right settings can help optimize performance and minimize storage costs.

Creating a Worker to Process Jobs in BullMQ

Now that we have a queue set up to produce jobs, let's create a worker (i.e., the consumer) to process these jobs. The worker will listen to the queue and execute the job tasks as they come in.

import { Worker, Job } from "bullmq";

// Create a worker to process jobs from the queue created earlier.
const worker = new Worker(
  "email-queue-001",
  async (job) => {
    console.log(`Email ${job.id} sending...`);
    // Perform the job task here, for instance, calling an email service.
    return "Success!";
  },
  {
    connection: connection,
    prefix: "bullmq:{email-group-001}",
  }
);

// Handle worker events.
worker.on("completed", (job) => {
  console.log(`Email ${job.id} sent. Job removed from queue.`);
});

// Graceful shutdown for the worker.
process.on("SIGTERM", async () => {
  await worker.close();
  await connection.quit();
});

Again, there are a few key points to note in the consumer code snippet above:

We create a Worker instance that listens to the queue created previously. The worker function processes each job that comes in and returns a result. In this example, it returns "Success!" upon completion, which would be stored in Dragonfly.
The worker is set up to handle events such as "completed", which triggers a log when a job is successfully processed and removed from the queue. This provides visibility into the job processing lifecycle.
To ensure a graceful shutdown, we listen for the "SIGTERM" signal. When this signal is received, we close the worker and disconnect from Dragonfly Cloud.

By implementing a worker, we can efficiently process jobs from the queue, leveraging Dragonfly Cloud's robust backend to scale seamlessly according to workload demands. This setup ensures that your job processing is both reliable and performant, with the flexibility to handle varying loads. Note that on the worker side, you can also configure various options, such as concurrency, which controls how many jobs are allowed to run concurrently. Auto-removal options can also be set on the worker side, although based on my experiments, I found that auto-removal options on the queue side take precedence over those on the worker side.

What Users Are Saying

This blog post isn't focused on benchmarking the performance of Dragonfly Cloud with BullMQ. If you're interested in detailed performance metrics and comparisons, please refer to our previous blog post on the subject here.

From my own experiments, I found that when queues and workers are configured properly, it's quite challenging to fully saturate a Dragonfly Cloud instance. For example, I ran a test using a 50GB Dragonfly Cloud instance, and even with multiple concurrent producers and consumers hitting the queues, the instance remained largely idle. This demonstrates the high performance and scalability of Dragonfly Cloud when handling BullMQ workloads. But don't just take my word for it—consider the testimonials from Dragonfly users who have experienced similar results.

Scaling BullMQ workloads with Dragonfly Cloud offers a streamlined and powerful solution for managing high-performance background job queues. By leveraging the ease of configuration and robust capabilities of Dragonfly Cloud, developers can focus more on their application logic and less on infrastructure management. While this post has highlighted the potential for optimized performance, the true power of Dragonfly Cloud is best experienced through real-world use. Don't be hesitant to give it a try and see the benefits for yourself!