Distributed Systems - The CAP Theorem

Updated: 2018-06-16
  • Consistent
  • Available
  • Partition tolerant: "The network will be allowed to lose arbitrarily many messages sent from one node to another", meaning one server is detached from the other servers, either it can answer request immediately(available) or confirm it gets the latest answer before replying(consistent), but not both.

    • Cause of Partitions: network failures or machine failures

Discussion

  • can have only two at once
  • to scale you have to partition(distributed system), so you are left with choosing either high consistency or high availability for a particular system. You must find the right overlap of availability and consistency.
  • (another way to put it:) any delay between nodes can be modeled as a temporary network partition and in that event you have but two choices either wait to return the latest data at a peer node (C) or return the last available data at a peer node (A).

System Examples

  • Redis, PostgreSQL, and Neo4J are consistent and available (CA) they don’t distribute data
  • MongoDB and HBase are generally consistent and partition tolerant (CP). In the event of a network partition, they can become unable to respond to certain types of queries (for example, in a Mongo replica set you flag slaveok to false for reads).
  • Finally, CouchDB is available and partition tolerant (AP). Even though two or more CouchDB servers can replicate data between them, CouchDB doesn’t guarantee consistency between any two servers.
  • Eventually consistent: DNS: the newly registered domain may take some time to propagate to all DNS servers. All DNS servers are available with their current records.

Real World Examples

  • For the checkout process you always want to honor requests to add items to a shopping cart because it's revenue producing. In this case you choose high availability. Errors are hidden from the customer and sorted out later.
  • When a customer submits an order you favor consistency because several services--credit card processing, shipping and handling, reporting--are simultaneously accessing the data.