Distributed Systems - Overview

Updated: 2020-12-29

Working on any non-trivial projects in any modern software companies would require some knowledge about distributed systems. With data at today's scale, everything is distributed. This is an attempt to create a mind map to help you navigate.

To some extent, "distributed systems" is about building reliable software on top of unreliable hardware.

The Stack

  • Servers (including bare metal, VMs, containers, serverless functions to run the applications and backends: AWS EC2 or GCP GCE
  • Databases to store data and make them readily available for appliations, and indexes to speed up searches and filters.
  • Caches to speed up reads by remembering results of expensive operations
  • Data Warehouse to store historical data for analytics
  • Storage to store files and objects (can also be used to serve static websites)
  • Message queues to communicate between processes and enable async operations.
  • Logging: AWS Kinesis, Fluentd, GCP Cloud Logging
  • Monitoring to monitor system health and key business metrics, and send out alerts: GCP Cloud Monitoring, Datadog
  • Service Discovery, Configs and Secrets: Consul/Vault
  • Orchestration / Provision: Kubernetes, Terraform
  • Package format: a common way to package all the applications, e.g. Docker
  • Code management: git or hg
  • CI/CD: continuous integration and deployment

More Stacks