Migrating to Dragonfly Cloud using RedisShake

Introduction

In previous blog posts, we have looked into various techniques for migrating to Dragonfly, including using Sentinel and RIOT. These methods have proven to be effective for many users, providing robust and reliable ways to transition their workloads to our modern high-performance in-memory data store.

One of the core design principles of Dragonfly is its compatibility with Redis and Memcached, ensuring it fits into the wider ecosystem of in-memory data stores. This commitment to compatibility opens up a wide range of migration options.

Recently, we announced Dragonfly Cloud. It's a great opportunity right now to explore the RedisShake project, a highly versatile and configurable tool that can be used to migrate your in-memory data workloads from other Redis-compatible service providers to Dragonfly Cloud. RedisShake simplifies the migration process, allowing you to efficiently and effectively move your data with minimal downtime and effort. Let's dive in!

The RedisShake Tool

RedisShake, originally developed by Alibaba Group, is a powerful tool designed to facilitate the migration of Redis data across different environments. Its ease of use, versatility, and configurability make it an ideal choice for anyone looking to migrate their in-memory data seamlessly.

RedisShake supports a wide array of use cases, from migrating data between cloud-based Redis environments, such as AWS, GCP, and Alibaba Cloud, to moving data from local or standard Redis instances. This flexibility ensures that RedisShake can accommodate various migration scenarios, regardless of your current setup.

RedisShake uses a TOML configuration file to define the source and destination Redis-compatible instances, as well as other parameters such as log level and batch size. As the simplest example, the configuration file might look like follows, which migrates data from a local instance running on port 6379 to another local instance running on port 6380:

[sync_reader]
address = "127.0.0.1:6379"

[redis_writer]
address = "127.0.0.1:6380"

We will explore more advanced configurations later in this blog post.

Why Migrate to Dragonfly Cloud?

Migrating to Dragonfly Cloud offers several compelling benefits that can enhance the performance and reliability of your in-memory data operations. Dragonfly Cloud is based on the Dragonfly source-available community edition and is designed to deliver superior performance, scalability, and ease of use, making it an attractive option for organizations looking to optimize their data infrastructure. Here are a few reasons why you should consider migrating to Dragonfly Cloud:

High Performance: Dragonfly Cloud leverages advanced multi-threading and optimized data structures to provide faster data access and processing times.
Scalability: With Dragonfly Cloud, you can easily scale your instances to handle increased workloads without compromising on performance.
Compatibility: Dragonfly Cloud is fully compatible with Redis protocols and commands, ensuring a seamless transition for existing Redis users.
Managed Service: As a fully managed service, Dragonfly Cloud takes care of maintenance, updates, monitoring, configurable backups, configurable network settings, and many more.

Before migrating to Dragonfly Cloud, let's make sure that you have an instance ready, which is a very straightforward process. You can simply sign up for an account and have your Dragonfly instance up and running in minutes. For more details about Dragonfly Cloud benefits, features, and pricing, please check out our announcement blog post.

!Getting Started with Dragonfly

Migrating with RedisShake - The Scan Mode

Now that we have our Dragonfly Cloud instance ready, let's explore how we can use RedisShake to migrate our data. RedisShake offers two primary modes for data migration: Scan Mode and Sync Mode. It also provides an RDB Mode for restoring data from RDB files, but we will focus on the Scan and Sync modes in this blog post.

The Scan mode operates by scanning the entire in-memory dataset of your source Redis instance and dumping it into your new Dragonfly instance. This mode uses the [SCAN](https://www.dragonflydb.io/docs/command-reference/generic/scan) and [DUMP](https://www.dragonflydb.io/docs/command-reference/generic/dump) commands to transfer data.

Let's walk through the steps to migrate your data using RedisShake in Scan mode. Assume that we have a source Redis instance running on Redis Cloud and a destination Dragonfly Cloud instance, both of which are accessible via their respective endpoints. The configuration file for RedisShake could look like this:

[scan_reader]
cluster = false
address = "redis-XXXXX.c123.us-east-1-4.ec2.redns.redis-cloud.com:12129"
username = "default"
password = "XXXXX"
tls = false
scan = true
dbs = [0]     # Database(s) to scan, we only want to migrate database '0'.
ksn = true    # Set to true to enabled keyspace notifications (KSN) subscription.
count = 100   # Number of keys to scan per iteration.

[redis_writer]
cluster = false    # Target Dragonfly instance is not a cluster.
sentinel = false   # Target Dragonfly instance does not use Sentinel.
master = ""        # Only used when 'sentinel' is true.
address = "XXXXX.dragonflydb.cloud:6385"
username = "default"
password = "XXXXX"
tls = true
off_reply = false

As self-explanatory as it is, the configuration file specifies the source Redis instance under the [scan_reader] section, and the destination Dragonfly instance under the [redis_writer] section. Once ready, you can compile and run RedisShake with the following commands:

# Clone the RedisShake repository.
git clone git@github.com:tair-opensource/RedisShake.git

# Build the RedisShake binary.
cd RedisShake
sh build.sh

# Run RedisShake with the configuration file above.
./redis-shake shake.toml
#=> INF load config from file: shake.toml
#=> INF log_level: [info], log_file: [/Users/XXXXX/RedisShake/data/shake.log]
#=> INF changed work dir. dir=[/Users/XXXXX/RedisShake/data]
#=> INF GOMAXPROCS defaults to the value of runtime.NumCPU [10]
#=> INF not set pprof port
#=> INF create ScanStandaloneReader: redis-XXXXX.c123.us-east-1-4.ec2.redns.redis-cloud.com:12129
#=> INF create RedisStandaloneWriter: XXXXX.dragonflydb.cloud:6385
#=> INF not set status port
#=> INF start syncing...
#=> INF [reader_redis-XXXXX.c123.us-east-1-4.ec2.redns.redis-cloud.com_12129] scanStandaloneReader dump finished.
#=> INF [reader_redis-XXXXX.c123.us-east-1-4.ec2.redns.redis-cloud.com_12129] scanStandaloneReader restore finished.
#=> INF all done

As you can see, RedisShake provides detailed logs to track the progress of the migration. Once the process is complete, you can verify that your data has been successfully migrated to Dragonfly Cloud. It is worth mentioning that:

The SCAN command guarantees that keys existing before and after the SCAN operation will be returned, but newly written keys might be missed, and keys deleted during the scan might still be written to the destination. This can be addressed through the ksn = true configuration, as key changes will be subscribed to and migrated as they occur.
In the meantime, the SCAN and DUMP commands can consume a significant amount of CPU resources on the source instance.
Moreover, switching to the new Dragonfly instance might require a brief downtime to ensure data consistency, as new writes during the switch-over might be lost.

With the limitations in mind, the Scan mode is still the most straightforward and compatible way for migration in general with minimal effort.

Migrating with RedisShake - The Sync Mode

The Sync mode is another migration option provided by RedisShake. RedisShake Sync mode operates by mimicking itself as a replica of the original Redis instance. It captures the full data sync in the format of an RDB (Redis Database) file and incremental changes in the format of an AOF (Append-Only File) stream. The RDB file is then converted into individual commands and applied to the target Dragonfly instance. Incremental commands from the AOF are applied afterward to ensure data consistency.

Let's take a look at an example configuration file for RedisShake in Sync mode:

[sync_reader]
cluster = false
address = "127.0.0.1:6379"
username = "default"
password = "XXXXX"
tls = false
sync_rdb = true   # Set to true to sync RDB file.
sync_aof = true   # Set to true to sync AOF file.

[redis_writer]
# Same as the previous example.

The main difference here is that we use the [sync_reader] section instead of [scan_reader]. Connection credentials stay the same for the destination Dragonfly instance. Note that we are using a local Redis instance as the source in this example, but you can replace it with any other service provider if the PSYNC command is supported and enabled. Also, we set sync_rdb and sync_aof both to true to make sure RedisShake performs a full sync first using the RDB file and then applies incremental changes using the AOF stream. The same ./redis-shake shake.toml command can be used to start the migration process. Once the destination Dragonfly instance is in sync, you can switch your application to use the new instance. Any new writes during the switch-over will be captured by the AOF stream and applied to the target instance as well. Now let's summarize the pros and cons of the Sync mode.

Sync Mode Benefits

Better Data Consistency: Sync mode offers better data consistency since it operates similarly to a replication process, ensuring that changes are continuously propagated to the target instance.
Less Performance Impact: This mode has a lower performance impact on the original instance compared to Scan mode, although the source Redis instance still needs to perform a BGSAVE operation, which carries the risks of its fork-based design.
Minimal Downtime: You can switch to the target Dragonfly instance without experiencing downtime, making this mode highly efficient for live environments.

Sync Mode Limitations

The Sync mode heavily relies on the PSYNC command, which is a standard Redis command but might be blocked or not supported by some service providers. For example:

Alibaba Cloud: Requires a special account with the permission to run PSYNC.
AWS ElastiCache: Requires opening a support ticket to enable the PSYNC command.
AWS MemoryDB: PSYNC is not supported. Only the Scan mode can be used.
GCP Memorystore: PSYNC is not supported. Only the Scan mode can be used.
Redis Cloud: PSYNC is not supported. Only the Scan mode can be used.

By using RedisShake Sync mode, you can achieve a more consistent and low-impact migration to Dragonfly Cloud. This method is particularly useful when minimal downtime and high data consistency are crucial, and it should be preferred whenever the PSYNC protocol is supported.

Conclusion

RedisShake is an excellent tool that offers a robust and flexible solution for migration. Whether you choose the Scan mode for its compatibility or the Sync mode for its data consistency and minimal performance impact, RedisShake provides you with the versatility to handle various migration scenarios effectively.

To get started, follow the comprehensive RedisShake documentation to understand more about its capabilities and to choose the most suitable migration strategy for your needs. RedisShake also supports advanced data processing options, such as filtering specific key prefixes during migration, allowing you to tailor the process to your specific requirements.

If you find this blog post interesting, please take the next step towards optimizing your data infrastructure! Try Dragonfly Cloud today and experience the benefits of a high-performance and scalable in-memory data store with a seamless migration process with the help of RedisShake.