The Unbearable Lightness of Horizontal Scaling

The challenge

When speaking with technical leaders and architects about the vertical scaling pattern of Dragonfly, I often encounter an initial level of skepticism: while teams managing Redis on a large scale often face difficulties due to high maintenance overhead, not everyone initially grasps how Dragonfly's multi-threading and vertical scaling capabilities can address their production issues and lower their costs.

Their argument often goes as follows:

"It's impressive that you can achieve 1M QPS on a single instance, but I can simply deploy 32 single CPU VMs in cluster mode and attain even greater throughput. In fact, your single-node architecture is inherently limited, while horizontal scalability is boundless"

This argument isn’t wrong. Vertical scaling will never be entirely linear, as some effort must be expended to coordinate threads, and horizontal scaling can indeed achieve higher throughput using a comparable number of CPUs. Furthermore, horizontal scaling is infinite, while vertical scaling is bounded. In fact, when dealing with massive instance sizes (64 CPUs or more), its scaling efficiency tends to decline rapidly as server networking constraints are reached.

However this argument focuses on a single metric - throughput. What it does not consider is the effect of horizontal scaling on cluster reliability. Moreover, the throughput metrics they were touting depend on certain assumptions about cluster performance that don’t always hold. In this blog post, I will lay out some of the problems with scaling horizontally that often fly under the radar, thus giving horizontal scaling a sheen of invincibility and vertical scaling a bad rap.

Hidden hurdles of horizontal scaling

Before we dive deep into the analysis of horizontal scaling, you may find it useful to read my other blog post about memory-bounded workloads.

(TLDR: For memory-bounded workloads, vertical scaling saves memory because we do not need to keep over-provisioning margins as large as for multi-shard clusters)

Datastores operate as stateful clusters. To retrieve a key \{Y\}, one must access shard X, which means the cluster capacity is compartmentalized among shards. A stateful system is available when all its shards are available together. This means that every single shard overload can independently render the whole cluster unavailable.

This differs from stateless systems, such as web frontends, where all nodes are interchangeable. If a node goes offline in these systems, the traffic is diverted to other nodes, leading to genuine pooling of cluster capacity.

Below we discuss how decreasing node size while increasing cluster size leads to inefficiencies and higher chances of a cluster unavailability.

Bigger clusters are more fragile

In the world of production systems, failures are an inherent possibility. Whether it's a host failure, changes in the production environment, sudden traffic spikes, or a software bug, no system is immune to these challenges. This means that we should design our systems with failure in mind and with such a decision framework we will reach an inevitable outcome: The more nodes in the cluster, the higher the chances of one of them to fail.

Lets define $$P_{f}$$ as node failure probability, then the probability of node liveness is represented by $$1-P_{f}$$. As we scale up the cluster by increasing the number of shards (n), the overall cluster liveness can be computed as $$(1-P_{f})^{n}$$. Hence, the availability of the cluster decreases exponentially as the cluster size grows. To be fair, we assume that we operate on a properly designed system working in a stable environment. In such cases, the value of $$P_{f}$$ tends to be very small. Consequently, we can make an approximation and represent $$(1-P_{f})^{n}$$ as $$(1-nP_{f})$$. However, even with this approximation, it is evident that cluster availability decreases linearly with respect to the cluster size.

Your node capacity must be larger than you think

When it comes to managing horizontally scalable clusters in production, one common observation is that the load on each server is rarely uniform. Some servers may be heavily overloaded, reaching 90% or higher utilization, while others may operate at a more relaxed 50% capacity. Additionally, the servers experiencing the highest and lowest loads can change over time. This variation is a direct result of how the load is distributed across cluster shards, especially in stateful systems where requests are directed to the servers hosting the relevant data. Uneven data distribution patterns can even lead to the emergence of hot keys, exacerbating the load imbalance. To handle potential load spikes effectively, it becomes necessary to provision each shard based on its maximum expected load, with larger margins of over-provisioning for smaller nodes.

Lets back up this claim with math. Suppose we use a single server sustaining 1M requests per second, where each request uses on average 0.1ms of CPU time. The load on the server is $$L=1000000*0.0001=100$$. The classical queuing theory says that we must provision the server to support $$L + √(L)$$ or have 10% margin in this case. Now assume that instead of using a single server you provision 100 shards cluster. Each shard will sustain 10K qps on average (100 times less) and its load will be $$L_{\text{shard}} = 10000 \times 0.0001 = 1$$ However, for a robust and smooth experience we would need to provision each shard for $$L_{\text{shard}} + √(L_{\text{shard}}) = 2$$, in other words, we would need 100% margin!

You don’t get as much throughput per node as you think, due to cloud over-commit

When we allocate a small single-CPU VM in the cloud, we do not get a guaranteed throughput capacity. Instead, we share the resources between all VMs on that server. The cloud may temporarily provide us with more network bandwidth if the host is under-utilized. However, this throughput is not guaranteed. Once a noisy neighbor hops on the host, it may take some of the bandwidth from our VM. And as you can see the difference between the best case and the worst-case is huge.

We performed the following experiment to prove it:

we reserved a m5.24xlarge host.
we deployed a single m5.large VM with 2-vcpu into our reserved host and launched a single-threaded Redis server there. Redis-server reached 153K qps on it.
Then, we added another 47 m5.large VMs running a Redis server into the same host and sent traffic to all of them at the same time.

And here is the interesting part. The traffic to original redis-server went down from 153K qps to 18.5K qps - a staggering 8.3 times reduction! The overall throughput capacity of all Redis servers on the host was 890K qps and not 7.3M, as one could expect interpolating the result from the first experiment. And by the way, when we load tested a single Dragonfly running on all the resources of m5.24xlarge, it reached 1M qps, which is consistent with what 48 Redis servers achieved on that host.

To summarize these three points, horizontal scaling is an important technique to reach highly scalable workloads for distributed systems. However, scaling out horizontally with small instances may cause instability in production environments, and eventually much higher infrastructure costs due to aggressive over-provisioning of node capacities.

Scaling vertically with Dragonfly

Dragonfly is a drop-in Redis replacement that, unlike Redis, scales vertically. Workloads of millions of qps or up to 1TB of data can run on a single Dragonfly instance on all the major clouds.

To the best of our knowledge, Dragonfly is the only cost-efficient and well maintained source-available technology that provides vertical scale out of the box. A single Dragonfly instance is as powerful as a cluster of Redis instances, more predictable in terms of load and more cost-effective than a managed Redis service. By deploying a community edition Dragonfly on the right instance type on your cloud of choice, you can potentially reduce infrastructure costs by a factor of 2-3 compared to alternatives.

DragonflyDB Cloud (our managed service) will support horizontally scalable clusters of any size. DragonflyDB Cloud is designed to provide the most reliable and cost-efficient in-memory service in the world.

The Unbearable Lightness of Horizontal Scaling

The challenge

Hidden hurdles of horizontal scaling

Scaling vertically with Dragonfly

Stay up to date on all things Dragonfly

Switch & save up to 80%