Tech Stacks
    Overview
    System Design Patterns
    CAP Theorem
    C10K and C10M
    Network Programming Models
    Infrastructure as Code
    Examples

Storage

Updated: 2022-08-14

3 types

3 types of storage: block storage, file storage, object storage(blob=binary large object)

The unit of these 3 types:

  • block storage: evenly sized chunks.
  • file storage: a hierarchy of files in folders
  • blob/object storage: immutable objects, e.g. images, audio or other multimedia objects; sometimes binary executable code is stored as a blob.

Example of storage systems

  • Online Transaction Processing Databases (OLTP)
    • Facebook Graph, mission critical, strong consistency, core services
  • Semi-online Light Transaction Processing Databases (SLTP)
    • Facebook Messages and Facebook Time Series
  • Immutable DataStore
    • Photos, videos, etc
  • Analytics DataStore
    • Data Warehouse, Logs storage

Facebook example. This is adapted from this slide

Service Technology Bottlenecks Latency Consistency Durability
Facebook Graph MySQL/TAO Random read IOPS few ms quickly consistent across data centers no data loss
Messages and Time Series HBase and HDFS Write IOPS/storage capacity < 200 ms consistent within a data center no data loss
Photos / Videos Haystack storage capacity < 250 ms immutable no data loss
Data Warehouse Hive / Presto / HDFS storage capacity < 1min not consistent across data centers no silent data loss

The core of a distributed storage system

  • sharding strategy
  • metadata storage

Distributed File Systems

Distributed file systems: GFS, Colossus, Alluxio, CephFS, HDFS

  • Cluster level, fault tolerant, distributed file systems:
    • append only
    • not for structured data(use database instead)
    • not optimized for small files
    • cluster level, not data center level, data destroyed after the cluster turns down
  • HDFS is the open source version of GFS(Google File System)
  • Colossus is the successor of GFS
  • Spanner uses Colossus to store its tablets

Software Defined Storage

  • Ceph: by Red Hat; object store at its core, but support all 3 types (object, block, file). For shorter-term stoarge and more frequent user access.
    • CephFS: a POSIX-compliant network file system
  • Gluster: scalable file storage with object capabilities; also by Red Hat. Should not be used for something transactional, like a database or something that depends on really strict locking.
  • Alluxio: a virtual distributed storage system
  • MinIO: S3 compatible object storage, k8s native.
  • Rook: as a storage orchestrator (can be used with Ceph where Ceph is a storage provider, Rook: Ceph Operator + Discovery)
  • VMware vSAN: creates shared storage for VMs.
  • HDFS: part of Hadoop

ONTAP

NetApp’s proprietary operating system used in storage disk arrays such as NetApp FAS/AFF...

  • FAS: Fabric-Attached Storage
    • NetApp proprietary custom-build hardware appliances with HDD or SSD drives called hybrid FAS.
    • AFF: All Flash FAS. NetApp proprietary custom-build hardware appliances with only SSD drives and optimized ONTAP for low latency called AFF.

The terms ONTAP, AFF, ASA, FAS are often used as synonyms.

It enables you to combine multiple physical storage controllers into a single logical cluster that can non-disruptively service multiple storage workload needs.

All SAN Array build on top of AFF platform, and provide only SAN-based data protocol connectivity.