Updated: 2020-06-29

The core of a distributed storage system

  • sharding strategy
  • metadata storage


3 types of storage: block storage, file storage, object storage(blob=binary large object)

The unit of these 3 types:

  • block storage: evenly sized chunks.
  • file storage: a hierarchy of files in folders
  • blob/object storage: immutable objects, e.g. images, audio or other multimedia objects; sometimes binary executable code is stored as a blob.

Example of storage systems

  • Online Transaction Processing Databases (OLTP)

    • Facebook Graph, mission critical, strong consistency, core services
  • Semi-online Light Transaction Processing Databases (SLTP)

    • Facebook Messages and Facebook Time Series
  • Immutable DataStore

    • Photos, videos, etc
  • Analytics DataStore

    • Data Warehouse, Logs storage

Facebook example. This is adapted from this slide

Service Technology Bottlenecks Latency Consistency Durability
Facebook Graph MySQL/TAO Random read IOPS few ms quickly consistent across data centers no data loss
Messages and Time Series HBase and HDFS Write IOPS/storage capacity < 200 ms consistent within a data center no data loss
Photos / Videos Haystack storage capacity < 250 ms immutable no data loss
Data Warehouse Hive / Presto / HDFS storage capacity < 1min not consistent across data centers no silent data loss

Amazon's Offerings

  • S3: object storage
  • EBS: block storage
  • EFS: file storage

EBS and S3

  • EBS: high latency(comparing to databases)
  • EBS is mountable storage; it can be mounted as a device to an EC2 instance, NAS
  • S3 not mountable, a storage service, not a device
  • EBS can be thought of as external hard drive, while S3 is more akin to DropBox
  • Glacier is S3 for archived files.

Google's Offerings

  • Google Cloud Storage: object storage
  • Google Cloud Filestore: file storage
  • Persistent Disk: block storage

Redhat's Offerings

  • Gluster: scalable file storage with object capabilities
  • Ceph: object first, scalable object storage with block and file capabilities.

    • CephFS: a POSIX-compliant network file system

Ceph and Gluster are both SDS(Software-defined Storage). Ceph is for shorter-term stoarge and more frequent user access. Gluster should not be used for something transactional, like a database or something that depends on really strict locking.

Redhat recommends XFS, not ZFS.

Distributed File Systems

Distributed file systems: GFS, Colossus, Alluxio, CephFS, HDFS

  • Cluster level, fault tolerant, distributed file systems:

    • append only
    • not for structured data(use database instead)
    • not optimized for small files
    • cluster level, not data center level, data destroyed after the cluster turns down
  • HDFS is the open source version of GFS(Google File System)
  • Colossus is the successor of GFS
  • Spanner uses Colossus to store its tablets

Open-source Projects


  • DAS: Directly Attached Storage.
  • NAS: Network Attached Storage.