A prelude to analysis of Redis memory-store
Will Redis stay competitive in a few years without reinventing itself?
November 28, 2021
During the last 13 years, Redis has become a truly ubiquitous memory store that has won the hearts of numerous dev-ops and software engineers. Indeed, according to StackOverflow survey in 2021, Redis is the most loved database for the 5th time in a row and is at the top of db-engines ranking, way before the next contestant. But how well does Redis utilize modern hardware systems? Will it stay competitive in a few years without reinventing itself?
To understand choices behind Redis design, I have been reading the old posts of Salvatore @antirez - the creator of Redis. Before I begin, I want to add that I have tremendous respect for Salvatore and for how much he achieved by being a talented and authentic software programmer (even though he does not want to be remembered as such).
Based on his notes and GitHub discussions, I identified the following architectural principles in Redis:
- Simple is beautiful (and code is a poem) Probably the strongest motive around Redis is simplicity. Salvatore's preference is towards simple solutions and he expressed his attitude to coding in his Redis manifesto. As a consequence, Redis resides in a single self-contained codebase without much reliance on third-party projects and its functionality is implemented in plain C using posix API.
- Single-threaded architecture Redis development began a few years after Memcached. By that time, Memcached—the predecessor of Redis—was already a mature system used and supported by large, highly technological companies like Facebook and Twitter. It employed a multi-threaded architecture to scale its I/O performance vertically within a single node. Remarkably, Antirez chose to adopt a single-threaded design instead. Specifically, Redis uses a single thread to manipulate its main in-memory dictionary. Antirez defended this approach many times with the following arguments:
-Redis cares deeply about latency and adopts a shared-nothing architecture to control its tail and average latencies.
-Most of the CPU is spent on system CPU handling I/O rather than on userland CPU handling Redis data structures, making the benefits of parallelization limited.
-Request pipelining can increase throughput by orders of magnitude.
-On the other hand, multi-threading adds complexity. Quoting Antirez: "...Slower development speed to achieve the same features. Multi-thread programming is hard... In a future of cloud computing, I want to consider every single core as a computer itself..."
-Vertical scaling has physical limits; therefore, horizontal scaling (e.g., Redis Cluster) is the preferred way to scale.
In addition to the principles above, Redis maintains unique design goals that differentiate it from, say, a disk-based database. I believe that if we were to list Redis’s design goals by priority, they would be:
- Low latency
- High throughput
- Memory efficiency
- High availability
- Strong consistency guarantees
- Durability
Obviously, if one would want to implement an alternative memory store, he would need to prioritize design goals similarly to Redis. In other words, the new store design should not sacrifice low latency for durability or for strong consistency.
Retrospective
I think that simplicity was the main guideline for Redis, which heavily affected architectural decisions like its serialization algorithm, efficiency of its data structures, reliability, and more. Even the choice of its threading model has, in part, been made for simplicity reasons. And if Redis is an experiment on how far one can go nowadays by implementing relatively simple solutions, it has, without question, succeeded tremendously.
Redis in 2021 is a mature product with a relatively stable feature set, and the questions I am asking myself today are:
a) How much more efficient Redis could be today if it adopted state-of-the-art algorithms and data structures.
b) How much simpler it would be for a user if it adopted product simplicity over simplicity of implementation.
In other words, if one were to implement a drop-in replacement for Redis today by redesigning it from scratch, how would it compare to Redis? I do not have a definite answer to this question today but hope to have one in a few months.
Again, I am not arguing with the tremendous popularity of the Redis memory store. It seems that Salvatore's decision to go for simplicity and deliver features quickly in Redis’s early days paid off: today, Memcached is a niche system, and the vast majority of software stacks use Redis. However, I do claim (currently without proof) that it is possible to vastly improve the reliability, performance, and cost-efficiency metrics of a Redis-like memory store that follows similar design goals but employs different architectural principles.
In my next posts, I am going to detail Redis’s specific design choices based on the principles stated above and show how a different architecture, if better aligned with modern hardware systems, could bring what I think would be a disruptive change to in-memory data stores.