Distributed Systems - Overview
Updated: 2020-12-29
Working on any non-trivial projects in any modern software companies would require some knowledge about distributed systems. With data at today's scale, everything is distributed. This is an attempt to create a mind map to help you navigate.
To some extent, "distributed systems" is about building reliable software on top of unreliable hardware.
The Stack
- Servers (including bare metal, VMs, containers, serverless functions to run the applications and backends: AWS EC2 or GCP GCE
- Databases to store data and make them readily available for appliations, and indexes to speed up searches and filters.
- Caches to speed up reads by remembering results of expensive operations
- Data Warehouse to store historical data for analytics
- Storage to store files and objects (can also be used to serve static websites)
- Message queues to communicate between processes and enable async operations.
- Logging: AWS Kinesis, Fluentd, GCP Cloud Logging
- Monitoring to monitor system health and key business metrics, and send out alerts: GCP Cloud Monitoring, Datadog
- Service Discovery, Configs and Secrets: Consul/Vault
- Orchestration / Provision: Kubernetes, Terraform
- Package format: a common way to package all the applications, e.g. Docker
- Code management: git or hg
- CI/CD: continuous integration and deployment
More Stacks
- Stack Share: https://stackshare.io/
- Cloud Native Computing Foundation: https://www.cncf.io/