Distributed Systems - Notes

Updated: 2021-02-21
  • SQL->NoSQL, Data Warehouse->Data Lake: think less about how to put data in, but more when pulling data out.
  • Do you want it right? read your writes. Do you want it right now? bounded by fast SLA
  • devops replaces sysadmin

Servers

Servers are long-lived pieces of software that provide services. A server is a collection of running services. The most common kinds of services: HTTP and RPC request-handling services.

Data Formats / Serialization

  • GRPC
  • Protobuf: created and used by Google
  • Thrift: created and used by Facebook
  • Avro
  • RCFile(Record Columnar File): Facebook
  • Optimized Row Columnar (ORC) Hortonworks
  • Parquet: Cloudera and Twitter

parquet vs arrow:

  • parquet: on disk
  • arrow: in memory

Data Processing

Stream Processing vs Batch Processing

  • Observe what is happening, and act on events as they occur (stream processing)
  • Periodically crunch a large amount of accumulated data (batch processing)

Batch Data Processing

https://medium.com/@maximebeauchemin/functional-data-engineering-a-modern-paradigm-for-batch-data-processing-2327ec32c42a

Integration Patterns

  • API: contract driven
  • Event Driven
  • Data Stream Driven

State

  • Session state: across running things. Stateful sessions remembers stuff; Stateless does not remember on the session
  • Durable state: across failure, stuff is remembered when you come back later.

Others

Design Principle: Favor composition over inheritance.

IoT Analytics: distributed model scoring + centralised model building

Reactive Platform: VStack is a reactive platform in the sense that it uses an asynchronous message oriented architecture (which is the definition of "reactive").

Node.js: pure async, event-driven, non-blocking, based on event loop, single thread

nonblocking RPC server based on Netty

There's a difference between (A) locking (waiting, really) on access to a critical section (where you spinlock, yield your thread, etc.) and (B) locking the processor to safely execute a synchronization primitive (mutexes/semaphores).