Distributed Consensus: Raft vs Paxos
2025-01-10•2 min read
computer-scienceengineering
Distributed consensus is one of the fundamental problems in computer science. Two prominent solutions, Paxos and Raft, take different approaches to achieving the same goal: getting multiple nodes to agree on a value.
The Consensus Problem
In a distributed system, we need:
- Agreement: All nodes decide on the same value
- Validity: The decided value was proposed by some node
- Termination: All non-faulty nodes eventually decide
Paxos: The Classic Approach
Paxos, introduced by Leslie Lamport, is notoriously difficult to understand. It operates in phases:
- Prepare phase: Proposer sends a prepare request
- Promise phase: Acceptors respond with promises
- Accept phase: Proposer sends accept requests
- Accepted phase: Acceptors accept the value
The complexity comes from handling failures and concurrent proposals.
Raft: Understandability First
Raft was designed with understandability as a primary goal. It decomposes consensus into:
- Leader election: One node becomes the leader
- Log replication: Leader replicates log entries
- Safety: Ensuring correctness even with failures
The key insight is that having a strong leader simplifies the protocol significantly.
Implementation Considerations
When implementing consensus:
- Raft is generally easier to implement correctly
- Paxos can be more flexible in certain scenarios
- Both require careful handling of network partitions
- Performance characteristics vary by workload
Real-World Usage
- etcd and Consul use Raft
- Google Chubby uses Paxos
- ZooKeeper uses Zab (similar to Paxos)
The choice often depends on the specific requirements and the team's familiarity with the algorithm.