Distributed Systems - Monitoring
- Service Level Indicator (SLI): what to measure, e.g. latency, availability, data quality.
- Service Level Objective (SLO): desired target for a SLI, e.g. 100ms for p99 latency
- Service Level Agreement (SLA): external visible contract about a SLO, e.g. if p99 latency exceeds 100ms, refund or pay $ penalty.
- Monitoring system, timeseries database
- Originally built at SoundCloud
- Part of CNCF, second hosted project, after Kubernetes.
- Official website: https://prometheus.io/
- GitHub: https://github.com/prometheus
- OpenCensus and OpenTracing have merged to form OpenTelemetry.
- OpenCensus originated from Google.
- OpenTelemetry is part of CNCF
- Official website: https://opentelemetry.io/
- GitHub: https://github.com/open-telemetry/
- machine generated data or user generated data
- real-time or historical(log-based)