Distributed Logging

Updated: 2018-06-19
  • Commercial Solutions: Splunk, Sumo Logic
  • Open Source Solutions: Kibana

Facebook LogDevice: https://code.facebook.com/posts/357056558062811/logdevice-a-distributed-data-store-for-logs/

where to use log

Want to connect two stages of a data processing pipeline without having to worry about flow control or data loss? Have one stage write into a log and the other read from it. Maintaining an index on a large distributed database? Have the indexing service read the update log to apply all the changes in the right order. Got a sequence of work items to be executed in a specific order a week later? Write them into a log, have the consumer lag a week. Dream of distributed transactions? A log with enough capacity to order all your writes makes them possible. Durability concerns? Use a write-ahead log.

Distributed log 2 promises:

  • highly available and durable record storage
  • a repeatable total order on those records.

Record-oriented means that data is written into the log in indivisible records, rather than individual bytes.More importantly, a record is the smallest unit of addressing: A reader always starts reading from a particular record (or from the next record to be appended to the log) and receives data one or more records at a time. Still more importantly, record numbering is not guaranteed to be continuous. There may be gaps in the numbering sequence, and the writer does not know in advance what log sequence number (LSN) its record will be assigned upon a successful write

Logs are naturally append-only. No support for modifying existing records is necessary or provided. Logs are expected to live for a relatively long time — days, months, or even years — before they are deleted. The primary space reclamation mechanism for logs is trimming, or dropping the oldest records according to either a time- or space-based retention policy.

common to most of our logging applications is the requirement of high write availability.