Distributed Systems - Notes

Updated: 2021-11-19
  • SQL->NoSQL, Data Warehouse->Data Lake: think less about how to put data in, but more when pulling data out.
  • Do you want it right? read your writes. Do you want it right now? bounded by fast SLA
  • devops replaces sysadmin


Servers are long-lived pieces of software that provide services. A server is a collection of running services. The most common kinds of services: HTTP and RPC request-handling services.

Server Traffic management

  • load balancing
  • load shedding (throttling)
  • RPC client behaviors

Data Formats / Serialization

  • GRPC
  • Thrift: created and used by Facebook
  • Avro
  • RCFile(Record Columnar File): Facebook
  • Optimized Row Columnar (ORC) Hortonworks
  • Parquet: Cloudera and Twitter

parquet vs arrow:

  • parquet: on disk
  • arrow: in memory

Data Processing

Stream Processing vs Batch Processing

  • Observe what is happening, and act on events as they occur (stream processing)
  • Periodically crunch a large amount of accumulated data (batch processing)

Batch Data Processing


Integration Patterns

  • API: contract driven
  • Event Driven
  • Data Stream Driven


  • Session state: across running things. Stateful sessions remembers stuff; Stateless does not remember on the session
  • Durable state: across failure, stuff is remembered when you come back later.

Distributed lockservice

It is used for master elections, as a storage service for some classes of data, and as a name server


Design Principle: Favor composition over inheritance.

IoT Analytics: distributed model scoring + centralised model building

Reactive Platform: VStack is a reactive platform in the sense that it uses an asynchronous message oriented architecture (which is the definition of "reactive").

Node.js: pure async, event-driven, non-blocking, based on event loop, single thread

nonblocking RPC server based on Netty

There's a difference between (A) locking (waiting, really) on access to a critical section (where you spinlock, yield your thread, etc.) and (B) locking the processor to safely execute a synchronization primitive (mutexes/semaphores).