Batch Operations in Dragonfly: Pipelining, Transactions, and Lua Scripting
Explore advanced batch operations in Dragonfly, including pipelining, transactions, and Lua scripting, to enhance performance and data handling in high-throughput applications.
June 27, 2024
Introduction
In our previous blog posts, we explored the realm of Lua scripting and the sophistication of Dragonfly's low-level transactional framework. The framework's ability to ensure atomicity for individual commands within the robust, multi-threaded architecture of Dragonfly is both fundamental and interesting to learn. While this framework forms the backbone of Dragonfly's performance capabilities, it operates behind the scenes, seamlessly ensuring data integrity without the need for direct user intervention.
As we shift our focus to the practical application of Dragonfly, this blog post will explore the various methodologies for executing multiple commands together in a single batch. Each method offers unique advantages that can be leveraged to optimize performance and efficiency. Whether you're looking to enhance throughput, reduce latency, or ensure data consistency, understanding these batch operation techniques will empower you to make informed decisions tailored to your specific needs while working with Dragonfly.
ACID in the Context of In-Memory Data Stores
As we prepare to dive into the capabilities of batch operations in Dragonfly, it's essential to revisit the core principles encapsulated by the ACID properties: atomicity, consistency, isolation, and durability. These properties, which define transactional behavior in traditional SQL database systems, take on unique characteristics in the context of in-memory data stores.
For in-memory data stores, durability often takes a backseat.[^1] The primary allure of these systems is their ability to deliver sub-millisecond response times with high throughput, which can be at odds with the overhead of persisting every operation on disk. While durability is crucial for ensuring data persistence after transactions, many in-memory databases trade on-disk durability for speed, focusing instead on delivering lightning-fast read and write operations.
Consistency in traditional databases often involves stringent enforcement of rules such as constraints, cascades, and checks to ensure that all transactions lead to a valid state. However, in-memory data stores like Dragonfly typically operate under a more relaxed definition of consistency. They ensure that the database doesn't enter an invalid state, but they might not enforce complex constraints as strictly as traditional databases. Consistency here often means that operations within the database behave as expected and adhere to basic integrity rules. Applications may require additional mechanisms to enforce a higher level of consistency, depending on their specific needs.
In today's discussion, while we will touch upon consistency and durability, our focus will largely remain on understanding and optimizing the performance implications of atomicity and isolation in Dragonfly, which are critical for ensuring that batch operations execute reliably and efficiently.
Pipelining: Efficient Command Processing
Pipelining is an indispensable technique supported by Dragonfly that significantly enhances the efficiency of executing multiple commands. By enabling pipelining, a series of commands can be sent to a Dragonfly instance simultaneously without waiting for the replies of each individual command. This approach drastically reduces network latency because the round-trip time associated with single command and response transfers is minimized. Eventually, the client receives all responses at once, which can then be processed as needed.
Consider a practical scenario where we want to track user engagement metrics on a website. Specifically, we might be interested in capturing the following data:
- Site-wide total unique visitors for today, differentiated by user ID and IP address. - Page-specific total unique visitors for today, again distinguished by user ID and IP address.
To efficiently collect these metrics, we can utilize the HyperLogLog data structure, which is optimal for counting unique elements with minimal memory. Here's how we can streamline the data collection process using pipelining to send all relevant commands in a single batch:
$> cat pipelining.txt
PFADD daily_visitors_by_id:2024_06_27 "user_123"
PFADD daily_visitors_by_ip:2024_06_27 "127.0.0.1"
PFADD page_visitors_by_id:page_01:2024_06_27 "user_123"
PFADD page_visitors_by_ip:page_01:2024_06_27 "127.0.0.1"
$> redis-cli --pipe < pipelining.txt
And here's how you can achieve the same result programmatically in Go:
// Pipelining example in Go.
commands, err := client.Pipelined(ctx, func(pipe redis.Pipeliner) error {
// Add user ID and IP to daily unique visitors for the entire site.
pipe.PFAdd(ctx, "daily_visitors_by_id:2024_06_27", userID)
pipe.PFAdd(ctx, "daily_visitors_by_ip:2024_06_27", ipAddress)
// Add user ID and IP to daily unique visitors for a specific page.
pipe.PFAdd(ctx, "page_visitors_by_id:page_01:2024_06_27", userID)
pipe.PFAdd(ctx, "page_visitors_by_ip:page_01:2024_06_27", ipAddress)
return nil
})
if err != nil {
panic(err)
}
// Process responses stored in 'commands' if necessary...
Pipelining efficiently groups and sends multiple commands simultaneously but lacks atomicity, meaning not all operations may succeed together. Additionally, there's no isolation—other clients could modify data during execution, which might affect the outcome of pipelined commands.
Despite these limitations, pipelining is ideal for the example use case above due to several reasons. It minimizes network latency and handles high volumes of data quickly, which is essential for performance-critical environments. The operations (counting unique visitors) don't require strict atomicity. HyperLogLog provides approximate counts, making occasional command failures acceptable. The approximate nature of HyperLogLog counting means the exact timing of updates (during or just before command execution) has a negligible impact on the results.
It is worth mentioning that, to optimize the processing of medium-sized pipelines in Dragonfly with its multi-threaded nature, it's recommended to use multiple clients. Dragonfly also includes an experimental feature called pipeline squashing that allows for the parallelization of a single large pipeline. To demonstrate the efficiency, you can generate a large number of commands in a file and then execute them using the redis-cli --pipe
command. Our tests show that Dragonfly processes a single large pipeline twice as fast as normal when using a proper pipeline_squash
configuration. However, this performance advantage diminishes when small pipelines or transactions are sent in a loop.
With all these considerations in mind, pipelining is a valuable technique for applications where performance and throughput are priorities and where data operations are resilient to minor inconsistencies.
Transactions: Atomicity and Conditional Isolation
In Dragonfly, transactions play a crucial role in managing the execution of multiple operations atomically. This is primarily facilitated through the MULTI
and EXEC
commands, which wrap a series of operations into a single atomic unit. This means that all operations within the transaction are either executed successfully together or not executed at all, enhancing reliability and data integrity2. However, it's important to note that while these commands ensure atomic execution, they do not inherently provide isolation among concurrent transactions.
To address the need for isolation, the WATCH
command comes in handy, which acts similarly to an optimistic locking mechanism. This command monitors specified keys for any changes, and if a watched key is modified before the transaction commits, the transaction is aborted. With an easy-to-implement retry mechanism (you need to implement this yourself though), this feature ensures that transactions are only executed if the observed keys remain unchanged, thus preventing data races and maintaining isolation across operations. The behavior is often referred to as an atomic and isolated compare-and-swap (CAS) as well.
Pipelining can be used together with transactions as well. By doing so, we can efficiently manage transactional workflows while reducing network round-trips. This combination not only enhances performance but also leverages the strengths of both transactions and pipelining to achieve more robust data processing workflows.
In the following example, we'll simulate a situation where multiple products can be purchased using the same coupon, and each purchase decreases both the specific product's count and the shared coupon count. This ensures that operations on both the coupon and product inventories are managed atomically and conditionally using Dragonfly's transaction and pipelining capabilities.
WATCH couponKey productKey
couponCount = GET couponKey
productCount = GET productKey
couponCount = couponCount - 1
productCount = productCount - 1
MULTI
SET couponKey $couponCount
SET productKey $productCount
EXEC
In this example, the WATCH
command ensures that the coupon and product keys are not modified by other clients during the transaction. If either key is modified, the transaction is aborted, and the client can retry the transaction after re-evaluating the current state of the keys. You can achieve the same result programmatically in Go, with retry logic implemented as well:
// Optimistic locking, set a max retry count.
const maxRetries = 100
// Each purchase decreases both the specific product's count
// as well as the shared coupon count in a transaction.
func useCouponToPurchase(ctx context.Context, client *redis.Client, productId string) error {
// Keys for the coupon and product.
couponKey := "coupon:shared"
productKey := fmt.Sprintf("product:%s", productId)
// Transactional function.
txFunc := func(tx *redis.Tx) error {
// Get values from Dragonfly with pipelining.
commands, err := tx.Pipelined(ctx, func(pipe redis.Pipeliner) error {
pipe.Get(ctx, couponKey)
pipe.Get(ctx, productKey)
return nil
})
if errors.Is(err, redis.Nil) {
return errors.New("coupon or product not found")
}
if err != nil {
return err
}
// Retrieve values from the pipelined commands' responses.
couponCount, err := commands[0].(*redis.StringCmd).Int()
if err != nil {
return err
}
productCount, err := commands[1].(*redis.StringCmd).Int()
if err != nil {
return err
}
// Actual operations in application logic.
// Reduce coupon and product counts.
if couponCount <= 0 || productCount <= 0 {
return errors.New("insufficient coupon or product")
}
couponCount -= 1
productCount -= 1
// Operations are committed only if the watched keys remain unchanged.
// Note that these operations are also pipelined.
_, err = tx.TxPipelined(ctx, func(pipe redis.Pipeliner) error {
pipe.Set(ctx, couponKey, couponCount, 0)
pipe.Set(ctx, productKey, productCount, 0)
return nil
})
return err
}
// Retry if either key has been changed.
for i := 0; i < maxRetries; i++ {
err := client.Watch(ctx, txFunc, couponKey, productKey)
if err == nil {
// Successful transaction.
return nil
}
if errors.Is(err, redis.TxFailedErr) {
// Optimistic lock lost, retry.
continue
}
// Return any other error.
return err
}
return errors.New("reached max number of retries")
}
In the code snippet, the useCouponToPurchase
function simulates a purchase operation that decreases both the coupon and product counts. Like pipelining, transactions can batch multiple operations together, but with the added benefit of atomicity, conditional isolation, and potential additional application logic.
Lua Scripting: Ensuring Atomicity with Flexibility
Lua scripting in Dragonfly is a powerful tool for developers, allowing the grouping of multiple operations into a single atomic execution. This ensures that either all operations succeed or none at all, preserving data integrity without intermediate states. Because the script runs atomically on the Dragonfly server-side and Lua does support full-long control flow statements just as any other language does, it effectively provides isolation for compare-and-swap (CAS) scenarios. Other clients cannot see the intermediate states of the keys being modified by the script. They either see the state before the script starts or after it has completed.
Notably, Dragonfly offers an optional toggle to disable atomicity for Lua scripts. This feature is designed for scenarios where performance is more critical than strict atomicity, providing developers with a flexible trade-off. Learn more about Lua scripting techniques, internals, and configuration in this blog post.
Conclusion
As many of our readers may already be aware, Redis offers robust support for the batch operations discussed above as well. Given Dragonfly's design as a highly compatible drop-in replacement for Redis, it naturally inherits these capabilities. This compatibility ensures that developers who are familiar with Redis can easily transition to using Dragonfly without needing to significantly alter their application logic.
However, the concepts of pipelining and transactions can often be sources of confusion among developers. To clarify these concepts, a quick recap of each method and the key differences between Redis and Dragonfly in the context of batch operations should be helpful:
Pipelining
- Notes:
- Not atomic, no isolation.
- But it saves a lot of round-trip time.
- Ideal for performance-critical environments where operations are resilient to minor and unlikely inconsistencies.
- Redis: Commands executed sequentially in Redis.
- Dragonfly: High throughput via parallel command execution by multiple threads.
Transactions
- Notes:
- Atomic, but no isolation by default.
- Isolation can be achieved with the
WATCH
command (optimistic locking). - Ideal for ensuring atomicity across multiple operations, can be combined with pipelining.
- Redis:
MULTI
andEXEC
commands for atomicity. Able to perform compare-and-swap operations usingWATCH
. - Dragonfly: Same as Redis, with the added benefit of parallel command execution when commands are pipelined.
Lua Scripting
- Notes:
- Flexible and powerful mechanism for both atomicity and isolation.
- Performance is better with preloaded scripts.
- You need to learn an additional but easy-to-learn programming language, Lua.
- Source control for scripts is recommended since they are part of your code base!
- Redis: Lua scripting with atomic and isolated.
- Dragonfly: Same as Redis, with the option to disable atomicity for performance-critical use cases.
In summary, Dragonfly builds upon the robust foundation of APIs and features provided by Redis, offering versatility and performance advancements with its brand-new multi-threaded shared-nothing architecture to meet the demands of modern, high-throughput applications. By understanding these enhancements and how they compare to traditional Redis capabilities, developers can better leverage Dragonfly to achieve optimal performance, reliability, and data integrity in their systems. To explore how Dragonfly can enhance your in-memory data handling capabilities, visit our website and join our community via the GitHub repository, the Discord chat server, or the Discourse forum today!
Footnotes
- At the time of writing, Dragonfly (v.1.19.0) supports snapshotting for persistence, which doesn't capture every single write on disk, and durability cannot be guaranteed. Redis, on the other hand, supports both snapshotting and append-only-file (AOF). However, even if the Redis AOF is configured as
always
, it is still a write-behind log instead of a write-ahead log (WAL). Redis AOF has a better persistence guarantee, but without WAL, full durability still cannot be achieved. ↩ - There is one exception. If a command within a transaction is syntax-wise correct but errors out during execution (e.g., due to a mismatch of the command and the existing data type), this command could fail and all other commands would work, which breaches atomicity. ↩