Brewer's CAP theorem: in any distributed system experiencing a network partition, you must choose between consistency (every read sees the latest write) and availability (every request gets a response). No system can guarantee all three simultaneously.
Raft decomposes consensus into leader election, log replication, and safety. A leader is elected per term via majority vote, receives all client writes, and replicates them to followers — committing once a majority acknowledges.
Replication strategies differ across synchrony (sync vs async), topology (leader-follower vs multi-leader vs leaderless), and conflict resolution. Each trades write latency against durability guarantees and read freshness.
Partitioning distributes data across nodes via key ranges (range partitioning) or consistent hashing (hash partitioning). Consistent hashing minimizes data movement during node additions and removals — critical for elastic scaling.
Distributed systems must detect and recover from node failures, network partitions, and Byzantine faults. Phi accrual failure detectors, gossip protocols, and saga patterns coordinate system-wide resilience.
Physical clocks skew across nodes. Lamport timestamps establish causal ordering; vector clocks track causal history per-process. Hybrid logical clocks combine both — maintaining causality while staying close to physical time.
The researchers, engineers, and system thinkers who built the theoretical and practical foundations of distributed computing.
Core vocabulary for distributed system design, consensus, and fault tolerance.