We could not have predicted the events of the last days. In a single week, Dragonfly transformed from a dream to reality.
In-memory datastores are the backbone powering high-frequency and low-latency workloads for almost all apps and services around us. They are used as a caching layer or the main datastore across industries such as social, gaming and e-commerce. While a wave of innovation swept the database industry with solutions such as Cockroach, Snowflake and BigQuery, the in-memory datastore market stagnated with technologies that were born over a decade ago. It’s time for a change.
Roman Gershman and I first met as engineers at Google. In 2013 we joined forces at Ubimo, a location intelligence company, where we had our first hands-on experience with in-memory datastores. As a startup that handled 300K requests per second, the only viable choice was using an in-memory datastore. At the beginning it was all great, but as the business grew, we started struggling with memory consumption, eviction logic and scaling issues.
Each limitation came with a workaround that was also limited in its own way. Impossible to grow with a single CPU? Fix by using application-level sharding. Resharding is an issue? Use cluster mode. Snapshot doubles the memory usage? Leave 55% memory free, or perform a snapshot from the HA instance. Team spending too much time on management? Get a managed service. Costs are too high? Flex your latency limitation and move to a different solution. These changes came at a high cost and always at an inconvenient time. It’s the story of managing and scaling in-memory datastores for many companies around us.
In 2020, Roman joined a big cloud infrastructure company as a principal engineer in a team providing in-memory datastores as a managed service. There he witnessed the limitations of the current solutions at a global scale. He didn’t believe that incremental improvements would provide the adequate answer to these problems - a profound architectural change was imperative. So in late 2021 we decided to join forces again and experiment with something we called Dragonfly.
Why Dragonfly? Dragonflies are speedy insects known for their agile flight and bright colors. They are elegantly designed with two sets of powerful independent wings, capable of moving in any direction, performing aerial maneuvers and sudden direction changes. These are exactly the qualities we want our in-memory datastore to have.
Dragonfly started as an experiment to see how an in-memory datastore could look like if designed in 2022. Based on learnings from our experience as memory store users and engineers at cloud companies, we knew that two key properties for Dragonfly had to be preserved: 1) sub-millisecond latency over very high throughput, and the more challenging one, 2) atomicity guarantees for all operations.
Roman was psyched to start coding. After a few months he came up with a groundbreaking POC based on a completely new architecture. It had atomicity, low latency and, on top of that, he also added memory optimizations. At this point the naiad (baby dragonfly) hatched from its egg. We could already see that this was not a linear improvement but a giant leap and were eager to share this with the world, but it was still lacking maturity. We spent the next few months adding Redis and Memcached commands, documenting, testing and also adding some colors to its wings, just like a real dragonfly. Version by version it transformed into the Dragonfly we envisioned.
Then came May 31st - the “launch” date. The truth is that we did not plan to launch Dragonfly that day. We opened up the repo to our closest friends and design partners. One of our design partners shared the repo on Hacker News and we learned about it from the spike in github stars. This spike turned out to be a flood of stars and engagement. Dragonfly was flying, only way faster than we expected. Maybe the in-memory market was waiting for a Dragonfly?
Dragonfly is a modern replacement for Redis and Memcached. Dragonfly’s novel architecture vertically scales to support any workload limited only by the physical properties of the underlying hardware. This makes Dragonfly simple to run, monitor and scale, which also translates to overall cost savings. Dragonfly is a masterpiece creation by Roman Gershman. It was designed from the ground up to support next-century cloud workloads.
We released Dragonfly under BSL, so anyone can use it to run their workloads as long as they do not offer Dragonfly as a service or a managed service. In the near future we are planning to add high availability and support for more Redis commands. With the unlocked power of multiple CPUs there are many aerial maneuvers Dragonfly can do. Some of them are already live, some of them are about to hatch and the best ones will probably come from the community.
So today we are announcing Dragonfly - a modern replacement for Redis and Memcached. We may have not predicted the events of the past few days but we cannot be more ecstatic about it.