The Cost of Garbage Collection for State Machine Replication
State Machine Replication (SMR) protocols form the backbone of many distributed systems. Enterprises and startups increasingly build their distributed systems on the cloud due to its many advantages, such as scalability and cost-effectiveness. One of the first technical questions companies face when...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | State Machine Replication (SMR) protocols form the backbone of many
distributed systems. Enterprises and startups increasingly build their
distributed systems on the cloud due to its many advantages, such as
scalability and cost-effectiveness. One of the first technical questions
companies face when building a system on the cloud is which programming
language to use. Among many factors that go into this decision is whether to
use a language with garbage collection (GC), such as Java or Go, or a language
with manual memory management, such as C++ or Rust. Today, companies
predominantly prefer languages with GC, like Go, Kotlin, or even Python, due to
ease of development; however, there is no free lunch: GC costs resources
(memory and CPU) and performance (long tail latencies due to GC pauses). While
there have been anecdotal reports of reduced cloud cost and improved tail
latencies when switching from a language with GC to a language with manual
memory management, so far, there has not been a systematic study of the GC
overhead of running an SMR-based cloud system.
This paper studies the overhead of running an SMR-based cloud system written
in a language with GC. To this end, we design from scratch a canonical SMR
system -- a MultiPaxos-based replicated in-memory key-value store -- and we
implement it in C++, Java, Rust, and Go. We compare the performance and
resource usage of these implementations when running on the cloud under
different workloads and resource constraints and report our results. Our
findings have implications for the design of cloud systems. |
---|---|
DOI: | 10.48550/arxiv.2405.11182 |