Dragonfly

Question: Does Kafka store data in memory?

Answer

Apache Kafka is a distributed streaming platform designed to handle real-time data feeds with high throughput and low latency. The question of whether Kafka stores data purely in memory or on disk is not exactly straightforward, because it uses both.

Kafka stores all messages on disk and has the capability to hold immense amounts of data for long periods of time. It's not an in-memory system that loses data when a process shuts down; instead, it persists data to disk, ensuring durability and fault-tolerance. In other words, even when a Kafka server goes down, no data loss occurs.

However, Kafka also makes substantial use of the operating system's page cache, which effectively stores frequently accessed parts of the log in memory. This is done for performance reasons - reading from and writing to memory is significantly faster than doing the same operations on disk. As such, while its storage is primarily on disk, Kafka often behaves like an in-memory database due to its intelligent usage of the OS page cache.

Here is a simplified view on how Kafka leverages memory:

// Producer sends data
ProducerRecord<String, String> record = new ProducerRecord<>("topic", "key", "value");
producer.send(record);

// Broker receives data and writes it to a local log (disk)
Log.appendAsLeader(RecordBatch)

// As consumers request data, Kafka reads from these local logs.
// Frequent accesses end up in memory through OS page cache
FetchContext.readFromLog()

Remember, this might be overly simplified and actual code in Kafka would look different. The idea here is to demonstrate that Kafka efficiently uses both memory and disk to ensure high performance and reliability.

Was this content helpful?

Other Common In Memory Questions (and Answers)

White Paper

Free System Design on AWS E-Book

Download this early release of O'Reilly's latest cloud infrastructure e-book: System Design on AWS.

Free System Design on AWS E-Book

Switch & save up to 80% 

Dragonfly is fully compatible with the Redis ecosystem and requires no code changes to implement. Instantly experience up to a 25X boost in performance and 80% reduction in cost