Distributed Systems - Distributed Consensus

Updated: 2018-12-15

Consensus Algorithms


Zookeeper - a distributed lock server It’s used for configuration. Really interesting piece of technology. Hard to use correctly

At the heart of ZooKeeper is an atomic messaging system that keeps all of the servers in sync.

Systems such as ZooKeeper are explicitly sequentially consistent because there are few enough nodes in a cluster that the cost of writing to quorum is relatively small. The Hadoop Distributed File System (HDFS) also chooses consistency – three failed datanodes can render a file’s blocks unavailable if you are unlucky.

zookeeper:One common problem with all these distributed systems is how would you determine which servers are alive and operating at any given point of time? Most importantly, how would you do these things reliably in the face of the difficulties of distributed computing such as network failures, bandwidth limitations, variable latency connections, security concerns, and anything else that can go wrong in a networked environment, perhaps even across multiple data centers? These types of questions are the focus of Apache ZooKeeper, which is a fast, highly available, fault tolerant, distributed coordination service.

one server acting as a leader while the rest are followers. On start of ensemble leader is elected first and all followers replicate their state with leader. All write requests are routed through leader and changes are broadcast to all followers. Change broadcast is termed as atomic broadcast.

ZooKeeper is a distributed, hierarchical file system that facilitates loose coupling between clients and provides an eventually consistent view of its znodes, which are like files and directories in a traditional file system. It provides basic operations such as creating, deleting, and checking existence of znodes. It provides an event-driven model in which clients can watch for changes to specific znodes, for example if a new child is added to an existing znode. ZooKeeper achieves high availability by running multiple ZooKeeper servers, called an ensemble, with each server holding an in-memory copy of the distributed file system to service client read requests.

  • System-wide Configeration
  • fast to roll out(comparing to code push)


  • NTP network time protocol
  • vector clocks


paxo, zookeeper, blockchain, raft

  • paxo: lightweight txn in cassandra, bigtable’s chubby lock manager

coordination service,distributed configuration store.

  • Leader selection (primary/slave selection)
  • Distributed locking
  • Task queue/producer consumer queue
  • Metadata store