- sharding strategy
- metadata storage
3 types of storage: block storage, file storage, object storage(blob=binary large object)
The unit of these 3 types:
- block storage: evenly sized chunks.
- file storage: a hierarchy of files in folders
- blob/object storage: immutable objects, e.g. images, audio or other multimedia objects; sometimes binary executable code is stored as a blob.
- Online Transaction Processing Databases (OLTP)
- Facebook Graph, mission critical, strong consistency, core services
- Semi-online Light Transaction Processing Databases (SLTP)
- Facebook Messages and Facebook Time Series
- Immutable DataStore
- Photos, videos, etc
- Analytics DataStore
- Data Warehouse, Logs storage
Facebook example. This is adapted from this slide
|Facebook Graph||MySQL/TAO||Random read IOPS||few ms||quickly consistent across data centers||no data loss|
|Messages and Time Series||HBase and HDFS||Write IOPS/storage capacity||< 200 ms||consistent within a data center||no data loss|
|Photos / Videos||Haystack||storage capacity||< 250 ms||immutable||no data loss|
|Data Warehouse||Hive / Presto / HDFS||storage capacity||< 1min||not consistent across data centers||no silent data loss|
Distributed file systems: GFS, Colossus, Alluxio, CephFS, HDFS
- Cluster level, fault tolerant, distributed file systems:
- append only
- not for structured data(use database instead)
- not optimized for small files
- cluster level, not data center level, data destroyed after the cluster turns down
- HDFS is the open source version of GFS(Google File System)
- Colossus is the successor of GFS
- Spanner uses Colossus to store its tablets
- DAS: Directly Attached Storage.
- NAS: Network Attached Storage.