Zero Downtime Migration from Redis to Dragonfly using Redis Sentinel
In this blog post, you will learn how to migrate from Redis to Dragonfly using Redis Sentinel.
July 11, 2023
Introduction
Dragonfly is a modern in-memory datastore that implements novel algorithms and data structures on top of a multi-threaded, shared-nothing architecture. Thanks to its API compatibility, Dragonfly can act as a drop-in replacement for Redis. At the time of writing, Dragonfly has implemented more than 200 Redis commands which represents good coverage for the vast majority of use cases. So why would someone want to migrate from Redis to Dragonfly? Due to Dragonfly's hardware efficiency, you can reduce your infrastructure costs up to 80% by migrating while also reducing your architectural complexity.
You can use a variety of techniques to migrate data from existing Redis deployments to Dragonfly. These include Snapshot and Restore, Replication, as well as custom solutions.
- Snapshot and Restore is a commonly used technique for migrating data between Redis instances. You can configure Redis to automatically save snapshots on disk or manually call the SAVE (or BGSAVE) command. During startup, Dragonfly will try to find the snapshot dump file in its current dir path and will load it automatically (for further details, refer to Saving Backups in the Dragonfly documentation).
- Replication is also a widely-used migration technique. This solution involves establishing a primary-replica relationship between the source Redis instance (master) and the replica instance (using REPLICAOF command), decommissioning the old master (by promoting the replica to primary using REPLICAOF no one), reconfiguring and restarting client applications to use the new primary node.
- In addition to the above-mentioned techniques, you can also build custom solutions using the MIGRATE command or tools such as RIOT (as discussed in this blog).
Each of the techniques has its own pros and cons. One of the common cons is application downtime. For example, Snapshot and Restore involves potential downtime during the migration process, especially for larger datasets. The time taken to create, transfer and restore the snapshot can lead to data loss as well as service interruption. Using Replication also involves manual process (re-configuring the application) and downtime.
In this blog post, we'll discuss how you can use Redis Sentinel to perform a zero-downtime migration from Redis to Dragonfly.
Redis Sentinel
Redis Sentinel is a distributed system designed to provide high availability and automatic failover for Redis. A Redis Sentinel based setup consists of one or more Sentinel processes that monitor the health and status of Redis instances. By continuously sending heartbeat-like check-ins and performing active monitoring, Redis Sentinel detects failures in Redis master nodes and orchestrates the failover process. It promotes suitable replica nodes to become the new master, updates the system configuration, and redirects clients to the new master to ensure uninterrupted service. This ability to automatically handle failovers can be used to perform migrations as well.
At a high level, here are the steps involved:
- Start a new Dragonfly instance and configure it as a replica of the source (primary) Redis instance.
- Replicate data from source Redis (primary) node to Dragonfly.
- Allow replication to reach steady state and monitor using INFO replication command.
- Stop the primary Redis node and let Sentinel promote the Dragonfly instance to become the new primary.
Let's look at each of these steps in detail.
Migration from Redis to Dragonfly using Redis Sentinel
Clone the Git repository:
git clone git@github.com:dragonflydb/dragonfly-examples.git
cd dragonfly-examples/sentinel-migration
Start Redis Primary, Replica and Sentinel nodes
redis-server --port 6379
redis-server --port 6380 --slaveof 127.0.0.1 6379
redis-server sentinel.conf --sentinel
Note that we started Redis primary, replica and Sentinel nodes on ports 6379, 6380 and 5000 (as per sentinel.conf) respectively.
Confirm Redis master node via Sentinel:
redis-cli -p 5000 sentinel get-master-addr-by-name the_master
#expected output
1) "127.0.0.1"
2) "6379"
the_master is the name of the master node as specified in sentinel.conf file.
Confirm Redis replica status via Sentinel:
redis-cli -p 5000 sentinel replicas the_master
#expected output
1) 1) "name"
2) "127.0.0.1:6380"
3) "ip"
4) "127.0.0.1"
5) "port"
6) "6380"
.....
Start the client application
The client application is a Go program that uses go-redis client library and connects to Redis via Sentinel (using NewFailoverClient).
To start the application:
go run main.go
#expected output
sentinel.go:685: sentinel: new master="the_master" addr="127.0.0.1:6379"
connected to redis
The application exposes a couple of HTTP endpoints to set and get key-value pairs. We will invoke these endpoints (via a script) to verify that the application works as expected and how it behaves during failover(s).
./test.sh
The script output should look like this:
{"key":"key-1","value":"value-1","from_node":"127.0.0.1:6379"}
{"key":"key-2","value":"value-2","from_node":"127.0.0.1:6379"}
....
Notice that the output contains the information of the primary node (port 6379) configured in Sentinel.
The base setup is ready and we have verified that the application works as expected. Let's begin the migration process.
Start Dragonfly and replicate data
There are several options available to get Dragonfly up and running quickly. We will be using Docker for this example.
docker run --network=host --ulimit memlock=-1 docker.dragonflydb.io/dragonflydb/dragonfly:latest --port 6000
Note that we have started Dragonfly on port 6000.
Dragonfly supports a primary/secondary replication model, similar to Redis’s replication. When using replication, Dragonfly creates exact copies of the primary instance. Once configured properly, secondary instances reconnect to the primary any time their connections break and will always aim to remain an exact copy of the primary.
To convert the Dragonfly instance into a replica of the primary Redis node we started earlier, run the following command:
redis-cli -p 6000 REPLICAOF localhost 6379
#expected output
OK
Note that Dragonfly to Redis replication is currently not possible.
To confirm replication, check the number of keys:
redis-cli -p 6000 DBSIZE
#expected output
(integer) 10
Failover to Redis replica
First, stop the primary node and witness Sentinel in action. Go to the terminal where you started the Redis server and stop the primary node by pressing Ctrl+C.
In the script output, you should see the following:
{"operation":"set","msg":"dial tcp 127.0.0.1:6379: connect: connection refused","code":500}
{"operation":"get","msg":"dial tcp 127.0.0.1:6379: connect: connection refused","code":500}
...
This is expected because the primary node is down and the application is unable to connect to it. Sentinel should trigger a failover and promote the replica to become the new primary node.
If you check the script, after some time the application should work as usual. In the script output, notice the from_node field. It should be the earlier replica node (on port 6380):
{"key":"key-1","value":"value-1","from\_node":"127.0.0.1:6380"}
{"key":"key-2","value":"value-2","from\_node":"127.0.0.1:6380"}
....
You can verify this by checking the master via Sentinel as well:
redis-cli -p 5000 sentinel get-master-addr-by-name the\_master
#output
1) "127.0.0.1"
2) "6380"
Failover to Dragonfly
Now, let's stop the new master (port 6380).
In the script output, you should see the following:
{"operation":"set","msg":"dial tcp 127.0.0.1:6380: connect: connection refused","code":500}
{"operation":"get","msg":"dial tcp 127.0.0.1:6380: connect: connection refused","code":500}
...
Just like in the previous case, this is expected because the primary node is down and the application is unable to connect to it. Sentinel will trigger a failover and promote the Dragonfly instance to be the new primary node.
If you check the script, after some time the application should work as usual. In the script output, notice the from_node field. It should be the Dragonfly instance (on port 6000):
{"key":"key-1","value":"value-1","from\_node":"127.0.0.1:6000"}
{"key":"key-2","value":"value-2","from\_node":"127.0.0.1:6000"}
....
You can verify this by checking the Sentinel status via CLI as well:
redis-cli -p 5000 sentinel get-master-addr-by-name the\_master
#output
1) "127.0.0.1"
2) "6000"
Conclusion
Migrating from Redis to Dragonfly can involve downtime, but this can be circumvented using Redis Sentinel. By replicating data from Redis to Dragonfly and then failing over to Dragonfly using Redis Sentinel, the failover was automatic and without manual steps of stopping or reconfiguring the application.
There are a few things to consider when using this approach:
- Sentinel client - Please ensure that the client library you use supports Sentinel. Without that, the client will not be able to connect to the Sentinel cluster and retrieve the master node information.
- Application resilience - The master node will be unavailable during failover and this will have a (temporary) impact on the application. To counter this, you need to ensure that the application is resilient to failures by having appropriate error handling, retry logic, and timeouts.
- Sentinel high availability - In this blog post, we used a single Sentinel process for demonstration purposes. For production use-cases, use a fault-tolerant Sentinel setup with a minimum of three nodes. This ensures that the Sentinel cluster itself is resilient to failures.