Redis vs. Dragonfly Scalability and Performance

A thorough benchmark comparison of throughput, latency, and memory utilization between Redis and Dragonfly.

February 27, 2023

Redis vs. Dragonfly Scalability and Performance

In-memory databases have become one of the most important pieces of infrastructure underpinning modern, performance oriented applications. Redis is the most popular of these because of its simplicity – it’s quick to set up, easy to use, and scalable to meet the demands of a growing business.

However, Redis was designed over a decade ago, and due to its simplistic single-threaded architecture, scaling Redis deployments can be a long and frustrating experience for developers. Scaling and maintaining Redis clusters is complex and time consuming, and if you don’t maintain your Redis cluster correctly, you may lose data, which can lead to user experience issues or even a site outage.

Alternatively, you can use Dragonfly. Dragonfly works as a drop-in replacement for Redis and Memcached — it’s a modern in-memory database that’s been developed to build upon the success of its predecessors, while improving performance and reliability. Its multi-threaded architecture allows for more effective vertical scaling, so you can avoid the headache that comes with horizontal scaling and cluster management while still enjoying the benefits of scaling your website.

In this article, we explain the main features of Dragonfly and show how it performs versus Redis (OSS version) on benchmarking tests for throughput, latency, and memory usage.

Redis: Simple to get up and running, complex to scale

Redis is single-threaded, which means it can only do one thing at a time. Even if the machine it’s running on has multiple cores, it can’t make use of them to perform multiple tasks in parallel. While Redis can be vertically scaled (by increasing the memory and processor speed of the hardware it runs on), the gains in performance quickly plateau – throwing more powerful multi-threaded processing power at a single-threaded process won’t result in anything getting done any faster.

To break through this limitation, your only option is horizontal scaling — adding more Redis servers, each running a single-threaded Redis process — and managing them as part of a cluster. Redis clusters are notoriously difficult to set up and maintain (unhealthy nodes can’t be easily replaced and must be manually configured, and snapshotting must be configured on each node separately) and they add complexity to your infrastructure, meaning you may end up needing specialist DevOps staff to run and maintain your cluster.

To add to the frustration of maintaining an unwieldy Redis cluster, Redis can run out of memory and crash, causing you to lose all your data. Redis needs to run many background processes to maintain a healthy state, and the workload often becomes unreasonably large for its single CPU core to handle, causing the database to crash. If this happens, your site may go down or there may be other serious impacts on user experience. This instability means it’s important for you to regularly snapshot your Redis database to avoid losing data, but unfortunately the snapshotting process itself also uses a lot of memory, leaving you once again at risk of a crash.

To combat these resource and stability issues, Redis users tend to massively over-provision their servers due to fear of data loss — either by having more cluster nodes than they actually need, which affects latency, or by provisioning machines with much larger memory capacity than they ordinarily require. They often end up paying for multiple times the amount of RAM that they need, just to cover the memory spikes from snapshotting.

Redis can also be affected by regular latency spikes. Because it’s single-threaded, any background process that needs to be run regularly causes a huge spike in latency, as there are no other threads to do the work. During periods of high latency, throughput drops. If this happens regularly, your users may become annoyed at a system that seems flaky.

While Redis remains popular, these issues are still unaddressed, and continue to waste time and resources for many organizations. Our frustrations with Redis led us to develop our own drop-in replacement that solves the most common issues faced by Redis users.

Dragonfly solves the biggest problems with Redis

Dragonfly is a drop-in Redis replacement to simplify production and boost performance. It scales vertically to support millions of operations per second and terabyte sized workloads on a single instance.

Compared to Redis’ single-threaded process, Dragonfly makes use of all hardware resources made available to it via its multi-threaded processing capabilities. This allows it to scale vertically in a way that leads to much higher throughput on high-spec machines — without the stress of managing a cluster. Dragonfly’s ability to make full use of all available CPU cores also makes it much cheaper to run, as you can deploy much smaller instance sizes of Dragonfly than you would need for Redis (for comparable sizes of data), and simply up-spec your server when additional resources are required.

We want everyone to enjoy the benefits of our faster, more reliable in-memory database, so we made Dragonfly subject to a Business Source License, meaning it costs nothing to use and the source code is made available. You can check out the code on GitHub and install it right away.

Benchmarking comparison between Dragonfly and Redis

In order to show the considerable performance advantage that Dragonfly has over Redis, we are sharing our benchmarking results with you here. To show the most direct comparison, we compared a single instance of Redis with a single instance of Dragonfly.

We ran our tests on AWS Graviton2 EC2 instances, which are network-optimized and provide the best performance for web applications. We used memtier_benchmark — a widely used benchmarking tool developed by Redis Ltd. — to test throughput and latency, and Prometheus to monitor and visualize memory usage.

Test #1: Throughput

The throughput of a database is measured by the number of operations it can process per second. This is one of the most important metrics for database performance.

We filled the Redis and Dragonfly databases with 10 million keys each, to simulate databases that are in use, and then ran a memtier_benchmark test against the Redis and Dragonfly instances.

As you can see from the graph below, the throughput for the multi-threaded Dragonfly far exceeds that of Redis.

throughput chart

Dragonfly throughput is far higher than Redis for both GET and SET operations.

How to repeat these tests for yourself

If you want to test these performance gains on your own infrastructure, you can follow this guide:

  • Install Redis version 7.0.4 and Dragonfly version 0.15.0 onto separate AWS EC2 instances (we used c6gn.16xlarge).

  • Edit the Redis configuration file:

    • protected-mode no This allows you to run the memtier_benchmark command from outside of the Redis server.
    • Change bind 127.0.0.1 to bind 0.0.0.0 This is also used to assist with remote Redis connections.
    • save "" This disables snapshotting.
    • enable-debug-command yes This enables debug commands, which you can use to add dummy data to the database.
  • Start Dragonfly with the command:

    • ./dragonfly-aarch64 --proactor_threads=64 This parameter specifies how many threads will be allocated to Dragonfly.
  • Start Redis.

  • Fill each database with 10 million keys, to simulate a database that is in use. debug populate 10000000 key 550

  • Run this memtier_benchmark command for each database:

    • memtier_benchmark -p 6380 --ratio=<1:0 for GET, 0:1 for SET> --hide-histogram --threads=<2 for Redis, 64 for Dragonfly> --clients=30 --requests=200000 --distinct-client-seed --data-size 256 --expiry-range=500-500 -s <IP address for Redis/Dragonfly server>

memtier_benchmark is a versatile tool, so be sure to check out the user manual if you want to tweak the test parameters to reflect your own usage scenarios.

Test #2: Latency

Database latency is a measure of how long operations take to complete. When a database has high throughput, this can often affect its latency, as writing more data to the database can cause delays. It's particularly important to consider the tail latency of a system. Tail latency focuses on the operations that have taken the longest time to complete, rather than the overall average time. Tail latency is measured for the higher percentiles – for example, a P99 latency is a measure of how long the longest 1% of requests took to complete.

Tail latency is an important metric to measure. If every 1 in 100 or even 1 in 10000 operations is unacceptably slow, then — depending on your traffic — a significant number of users could be affected, undermining confidence in your product or service. It is becoming increasingly common for service level agreements (SLAs) to demand that your product’s 99th percentile for response time is below a certain threshold.

Dragonfly, while providing up to 30 times the throughput of Redis, does not show a significant increase in latency for the slowest 1% of requests.

Below, we have graphed the results of our test of the P99 latency of Dragonfly (using 64 cores) and Redis. It’s worth noting that we used exactly the same memtier_benchmark commands for our latency test as we did for our throughput test.

latency chart

The P99 latency of Dragonfly is only a little higher than that of Redis, even when the throughput is 25–30 times more than for Redis.

This graph shows that the P99 latency of Dragonfly is only slightly higher than that of Redis, despite Dragonfly’s massive throughput increase – it's worth noting that if we were to reduce Dragonfly's throughput to match that of Redis, Dragonfly would have much lower P99 latency than Redis. This means that Dragonfly will give you significant improvements to your application performance.

Test #3: Memory efficiency during the snapshotting process

Redis is known to perform particularly slowly when snapshots of the database are being created, as the bulk of its memory gets redirected to the snapshotting process. Dragonfly does not suffer from this issue due to its increased efficiency. Below, our graph demonstrates how much more efficient Dragonfly is while snapshotting, by comparing what happens when the Redis bgsave and the Dragonfly save commands are run — which initiate the snapshotting processes for their respective databases.

memory usage chart

First, a large number of records are added to Redis and Dragonfly, causing the memory usage of both to increase to 56 and 48 GiB respectively. Next, the snapshotting process is initiated for both. This causes Redis’ memory to spike massively, whereas Dragonfly’s memory has no significant change.

Our results show that:

  • Dragonfly has an inherently smaller memory usage for the same dataset.
  • Dragonfly is faster at both loading the dataset and producing the snapshot.
  • Dragonfly's memory usage does not grow during the snapshotting process, whereas Redis almost doubles its RAM requirements.

How to repeat these tests for yourself

  • Install Redis version 7.0.4 and Dragonfly version 0.15.0 onto separate AWS EC2 instances (we used r6gd.4xlarge).
  • Edit the Redis configuration file:
    • protected-mode no This allows you to run the memtier_benchmark command from outside of the Redis server.
    • Change bind 127.0.0.1 to bind 0.0.0.0 This is also used to assist with remote Redis connections.
    • enable-debug-command yes This enables debug commands, which you can use to add dummy data to the database.
  • Start Dragonfly with the command:
    • ./dragonfly --dir "/mnt/vol1/"
  • Start Redis.
  • Fill each database with 100 million keys, to simulate a database that is in use.
    • debug populate 100000000 key 500
  • See our snapshotting video demonstration for further details.

Redis vs. Dragonfly benchmarks: Results summary

As the above benchmark results show, Dragonfly outperforms Redis by a huge amount. Dragonfly throughput is up to 30 times higher than Redis, while at the same time, P99 latency only increases by around 0.2ms.

The other high-impact performance improvement is that the snapshotting process — which uses up so much memory in Redis — causes no noticeable spikes in memory usage in Dragonfly.

If you are deploying a new application, or looking to scale an existing Redis deployment, Dragonfly can save you time and resources, both in the initial implementation of an in-memory database and with ongoing scaling and maintenance tasks. Dragonfly is a high-performance drop-in alternative to Redis that won’t suffer from resource exhaustion, and is ready to expand with your product as it grows.

Dragonfly is Redis compatible, with better performance and effortless scaling

We built Dragonfly to solve Redis’ prevalent performance issues and, as our results show, we have succeeded.

Redis is an amazing technology that has been utilized by millions of developers to get a cache up and running quickly. We built Dragonfly so developers would have something just as easy to use, but with better performance as well as being much more reliable.

Making the easy switch from Redis to Dragonfly is a great one-step solution to solving Redis’ scalability and saturation issues. You can get started easily in just a few minutes by running a Dockerized version of Dragonfly and calling it with exactly the same commands you would use to call Redis.