Distributed Systems - Monitoring

Updated: 2020-01-15

SLI vs SLO vs SLA

  • Service Level Indicator (SLI): what to measure, e.g. latency, availability, data quality.
  • Service Level Objective (SLO): desired target for a SLI, e.g. 100ms for p99 latency
  • Service Level Agreement (SLA): external visible contract about a SLO, e.g. if p99 latency exceeds 100ms, refund or pay $ penalty.

Open Source Tools

Prometheus

OpenTelemetry

Different Aspects

  • machine generated data or user generated data
  • real-time or historical(log-based)