Hi everyone, this is Kostas, a core data plane engineer at Dragonfly. I am excited to share insights into Linux memory management and its relevance to Dragonfly. Understanding how Linux handles memory overcommit is crucial for optimizing performance and avoiding unexpected behavior in production environments. In this post, we'll explore overcommit modes, their impact on memory allocation, and which mode is the best fit for Dragonfly to ensure optimal performance and reliability. Let's dive in!
What is Memory Overcommit?
Memory overcommit is a feature of Linux and other operating systems. Specifically for Linux, it determines how the kernel handles memory allocation requests from processes. Understanding memory overcommit is essential for:
- Preventing out-of-memory (OOM) crashes in production systems.
- Optimizing application performance.
- Debugging memory-related issues.
- Making informed decisions about system configuration.
Key Memory Overcommit Concepts
Before getting to the details, let's define some key terms we will use in this post:
- Virtual Memory: The memory a process can use, regardless of physical RAM.
- Memory Page: A contiguous, fixed-size region of virtual memory addresses.
- Page Fault: An "exception" raised when accessing unmapped virtual memory.
- OOM Killer: A kernel mechanism that terminates processes when memory is exhausted.
How Do Memory Requests Work?
The Linux kernel has different modes (or policies) for allocating memory. Depending on the policy used, a process can request more memory than is physically available. What does it mean for a process to request more memory? Here are the steps involved in a simplified form:
- The process makes a request to the kernel for more memory.
- The kernel fulfills the request by handing over a memory page.
- The allocated page merely exists because the content of the page, specifically its virtual addresses, is not yet mapped physically in RAM.
- It means that when a process accesses a virtual memory address, the memory management unit (MMU) will try to translate that virtual address into a physical address.
- The MMU will raise an "exception" because this mapping does not yet exist.
- This "exception" is called a page fault, and the page fault handler is responsible for creating that mapping on demand.
A notable example of the side effects of a page fault variant can be found in the presence of the fork()
system call. When a process forks itself, the parent and the child share their memory pages. When one process writes to one of the memory pages, a copy-on-write is triggered on that page. This is potentially expensive and is a pain point for Redis, which forks itself upon snapshots. The problem is that in high-load scenarios, if the parent process writes to the majority of its pages, each of them must be copied to the child. At the worst case, the child process will have a copy of all the parent's pages, which ends up consuming double the memory. To avoid this, Dragonfly doesn't fork itself for snapshots and uses a different algorithm to solve this problem in an innovative and efficient way. Check out our blog post on memory-efficient snapshotting for additional information.
A process could theoretically have an "infinite" number of pages as long as those pages are not mapped physically. For instance, you are running a process on a system with 16GB of RAM, and the process requests 32GB. As long as the process does not use the extra requested 16GB, everything is fine. After all, why would this be a problem? Your application is not actually trying to use more physical RAM than what is available. Any request to the kernel for more memory will succeed, regardless of the physical amount of memory available. It's up to the application to ensure that it does not get close to an OOM situation. What I just described, without the associated nuances, is one of the answers to "Linux has different modes/policies for memory allocation."
Memory Overcommit Modes
So, what are these memory overcommit modes, and how do they behave? There are three modes are available, namely 0
, 1
, and 2
. And all of these modes are configurable.
Before I dive into each of the overcommit mode, there are two important metrics that you can find in the /proc/meminfo
file that are relevant to memory overcommit: CommitLimit
and Committed_AS
.
CommitLimit
provides the size of available physical memory. It includes swap and is not the actual physical memory available—in fact, it's lower than the actual physical memory and is configurable.Committed_AS
provides the total memory allocated and used by all processes of the system.
Both of these variables have different semantics and importance based on the overcommit mode used, and their behavior will be discussed separately below for each mode.
Note that I am omitting and oversimplifying some of the underlying machinery. For example, there is also swap space, which I won't talk about, different kinds of page faults, etc. Once you get the gist of the concepts I am discussing, it becomes progressively easier to add the missing gaps and reflect better on the actual sophisticated complexity of the kernel.
Let's now jump to the different overcommit modes!
Overcommit Mode = 1
We briefly discussed this mode, the simplest one, in the previous section. The kernel overcommits by default, and a process can request more memory than what is physically available. It's helpful to imagine booking airline tickets. Airlines allow plane seats to be overbooked. So even if a plane only has 100 seats, the airline can sell 150 tickets. As long as no more than 100 people board the flight, everything works as expected. It's only when more than 100 people try to board that some of them must have their boarding rejected. After all, there are only 100 seats available, not 150!
This is exactly how the kernel works in this mode—it will allow the process to allocate more memory than is physically available. The side effect of this is that memory allocation calls (malloc()
, new()
, etc.) never fail. The application is now responsible for monitoring the resources used and avoiding those dreaded OOM scenarios. Otherwise, if the application tries to use all the requested pages, it will trigger the OOM killer—a kernel algorithm that kills processes that consume too much memory. This effectively shields the system from OOM scenarios. It's important to understand that the OOM killer is beyond the process's control and will brutally kill the process if it manages to do so before the system OOM. Otherwise, you will get a system crash. Some configurations are available, but that's beyond the scope of this article.
What is explained above is easy to observe in practice. Firstly, you can enable overcommit mode 1
by running the following command:
> echo 1 | sudo tee /proc/sys/vm/overcommit_memory
Then, you can compile and run the following C++ code snippet:
// code snippet: allocate 32GB of memory
#include <assert>
int main() {
const size_t big_allocation = 34359738368ul; // 32GB
void* ptr = malloc(big_allocation);
assert(ptr != nullptr);
free(ptr);
}
On my system that has 16GB of physical memory, the assert(ptr != nullptr)
never triggers, even if the allocation size is 32GB. Last but not least, in this mode, the value of CommitLimit
is ignored and Committed_AS
merely denotes the total virtual memory allocated by all processes.
Overcommit Mode = 2
This mode is the opposite of overcommit_memory=1
. It imposes a hard limit on allocations and does not allow memory overcommit. Let's set this mode by running the following command:
> echo 1 | sudo tee /proc/sys/vm/overcommit_memory
Run the same code snippet as before, and you will notice that the assert(ptr != nullptr)
will trigger, and the allocation request will be rejected. The question is now how much memory a process can allocate. The answer is simple and depends on the CommitLimit
and Committed_AS
variables. The former becomes a hard limit, and the process can only allocate as much memory as their difference, which is CommitLimit – Committed_AS
. As processes request more memory, the value of Committed_AS
grows. Once it reaches CommitLimit,
the rest of the allocations will be rejected.
Similarly, you can write a small program similar to the code snippet we have that slowly asks for more memory and polls cat /proc/meminfo
on each iteration. You will see that once it reaches CommitLimit
, your allocation requests will be rejected, and malloc()
will return nullptr
. Last but not least, notice that CommitLimit
is way lower than the currently available RAM, making it a less desirable option for performant systems like Dragonfly. Note that the CommitLimit
value can change as it depends on the vm.overcommit_ratio
variable, which the user can update.
Overcommit Mode = 0
The last mode is overcommit_memory=0
, which is also the kernel's default mode. It is known as heuristic overcommit because the kernel uses its own algorithm to decide if an allocation should be rejected or overcommitted. By default, it rejects unreasonable allocations. For example, the code snippet we have that allocates 32GB of memory will fail by triggering assert(ptr != nullptr)
. However, the allocation will succeed if I reduce the allocation of 32GB to 17GB, which still surpasses the physical memory available on my system. Therefore, it's important to remember that with overcommit_memory=0
, for a memory allocation request:
- It might fail, as
malloc()
will returnnullptr
similar toovercommit_memory=2
. - It might also succeed even if it surpasses the physical memory available, similar to
overcommit_memory=1
. That is, your process will have zero mapped pages. When those accessed are close to the physical memory available, the OOM killer might be triggered and kill the process. Alternatively, your system might crash.
As for CommitLimit
and Committed_AS
variables, they behave the same as with overcommit_memory=1
in this mode.
Summary
Linux memory management is a complex yet fascinating system, and understanding overcommit modes is key to optimizing application performance. In this post, we explored how different overcommit modes affect memory allocations and why overcommit_memory=1
is often the best choice for applications needing greater control over memory usage. This mode ensures allocations succeed during high-load scenarios, avoids costly copy-on-write operations, and keeps memory management predictable.
While running Dragonfly, we recommend overcommit_memory=1
, as it aligns with our sophisticated design to handle memory efficiently while avoiding the OOM killer. Even better, Dragonfly Cloud, our managed service, takes many system optimizations like this into account for you—offering seamless performance without needing to tweak Linux kernel settings yourself.