Distributed Logging

Updated: 2020-06-29

Why Logging

  • Debug logging: instead of logging info, warn, error locally, send those logs to a centralized place so the stacktrace can be easily viewed in a dedicated web page, regardless where the code is being executed, whether in your dev server or some random node in staging/prod clusters.
  • System metrics: things like QPS, latency, availability, request counts, etc. These can tell you the system(cluster) health, and not quite related to business logics.
  • Business metrics: especially for billing purposes.

Distributed Log

  • high write availability, and durable record storage
  • repeatable total order on those records.
  • append-only, cannot modify existing records.
  • relatively long lived, the retention can be days or months. It also depends on the privacy policy, PII data may need to be deleted after the retention period, anonymized data may live longer.
  • Record-oriented: data is written into the log in indivisible records, rather than individual bytes.

Products and Solutions

  • Commercial Solutions: Splunk, Sumo Logic
  • Open Source Solutions: Kibana
  • Facebook LogDevice
  • Google Cloud Logging

Different Aspects

  • machine logs vs user logs
  • real-time vs historical
  • collected logs vs processed logs(sessionization, normalization, anonymization)
  • debug logs (INFO/WARNING/ERROR) vs event logs(revenue logs, click logs)
  • logs: retention, wipeout, takeout

Versioned Process Logs

Versioned directories (e.g. /prefix/YYYY/MM/DD/<version>/) should be used.

For Log Consumer

With non-versioned directories: an analysis job is reading log files, the data processing pipeline just finishes and updates the old data in place, the analysis job ends up reading half old files and half new files, with duplicated data or missing data.

With versioned directories: the analysis job can keep reading the old files until it finishes to get a consistent result, if it runs again it can pickup the new data.

For Log Producer

With non-versioned directories: the new logs need to be generated in place, or in another directory and be copied over.

With versioned directories: the new logs are generated in the new version directory, and the directory can be marked as ready or live and made visible to consumers. And it is easier to roll back.