GCP - Storage
The Four Main Categories of GCP Storage
- Object Storage: For storing unstructured files of any type and size (like photos, videos, backups, and website assets).
- Block Storage: For providing high-performance "virtual hard drives" to your virtual machines.
- File Storage: For providing a shared network drive that multiple applications can access at the same time.
- Databases: For storing structured or semi-structured data that needs to be queried, indexed, and managed (like user profiles, product catalogs, or application state).
1. Object Storage: Google Cloud Storage (GCS)
Analogy: Think of Cloud Storage as an infinitely large, highly durable, and cost-effective online filing cabinet or warehouse. You can store any kind of file ("object") in it, from a tiny text file to a multi-terabyte video.
Key Characteristics:
- Unstructured Data: Perfect for files like images, videos, audio files, backups, log files, and static website assets (HTML, CSS, JS).
- Accessed via API: You don't "mount" it like a hard drive. You access objects via a simple REST API (HTTP
GET
,PUT
,DELETE
) or through client libraries. - Global and Scalable: You can store virtually unlimited amounts of data, and it's accessible from anywhere in the world.
- Tiered Pricing (Storage Classes): This is a key feature for cost optimization. You can choose a storage class based on how frequently you need to access the data:
- Standard: For "hot" data that is frequently accessed (e.g., website images). Highest storage cost, lowest access cost.
- Nearline: For "warm" data accessed less than once a month (e.g., monthly backups).
- Coldline: For "cool" data accessed less than once every 90 days (e.g., quarterly logs).
- Archive: For "cold" data accessed less than once a year (e.g., long-term compliance archives). Lowest storage cost, highest access cost.
When to use it: The default choice for storing any kind of file-like data. It's the backbone for data lakes, backups, and serving static content.
Cloud Storage FUSE: a FUSE adapter that lets you mount and access Cloud Storage buckets as local file systems, so applications can read and write objects in your bucket using standard file system semantics. Cloud Storage FUSE CSI driver lets you use the Google Kubernetes Engine (GKE) API to consume buckets as volumes, so you can read from and write to Cloud Storage from within your Kubernetes pods.
Cloud Storage with Cloud Storage FUSE is the recommended storage solution for most AI and ML use cases because it lets you scale your data storage with more cost efficiency than file system services.
2. Block Storage: Persistent Disk (PD)
Analogy: Think of Persistent Disk as the high-performance virtual hard drive (HDD or SSD) that you attach to your virtual machine (GCE).
Key Characteristics:
- Tied to a VM: A Persistent Disk is attached to a specific Compute Engine instance and acts as its boot drive or as an additional data drive.
- Formatted with a Filesystem: It's a raw block device. You format it with a filesystem like
ext4
orXFS
and then mount it inside your VM's operating system. - High Performance: It's designed for high IOPS (Input/Output Operations Per Second) and low latency, which is essential for running operating systems and databases.
- Durable and Redundant: The data is automatically replicated in the background to protect against hardware failure.
- Types of PD:
- Standard (HDD-based): Cheaper, good for bulk storage or workloads that aren't I/O intensive.
- Balanced (SSD-based): A great middle ground with good performance for most applications.
- SSD / Extreme (SSD-based): Highest performance, designed for mission-critical databases and applications that need maximum IOPS.
When to use it: As the boot disk for your GCE VMs or as a data disk for a single VM that needs fast, low-latency storage (like a self-managed database).
Local SSD vs PD
- Local SSD: attached to a specific VM, fast, however data may be lost.
- PD Persistent disks: durable network storage devices that your instances can access like physical disks. use case: accessed by a single VM, or content does not change (attach it to a read-only disk to hundreds of VMs) can be HDD (pd-standard) or SSD (pd-balanced, pd-ssd, pd-extreme)
3. File Storage: Filestore
Analogy: Think of Filestore as a high-end, managed Network-Attached Storage (NAS) device on the cloud. It's a shared network drive that many different machines can connect to and use at the same time.
Key Characteristics:
- Shared Access (NFS): It provides a filesystem that can be mounted simultaneously by hundreds or thousands of GCE VMs or GKE clusters using the standard NFS (Network File System) protocol.
- Manages "File Locking": Because it's a true file system, it handles file locking, which is critical when multiple applications need to read and write to the same files without corrupting them.
- High Performance: Designed for demanding workloads like video rendering, high-performance computing (HPC), or hosting content management systems.
When to use it: When you have a group of VMs or containers that all need to access and modify the same set of files concurrently. This is a common requirement for traditional enterprise applications, media processing pipelines, and research computing.
4. Databases
This is a huge category on its own, for storing structured and semi-structured data.
Analogy: Databases are like highly organized, intelligent, and searchable digital libraries. Instead of just storing files, they store data in a structured way that allows for complex queries, indexing, and transactions.
Key GCP Database Types:
-
Cloud SQL:
- What it is: A fully managed service for traditional relational databases (SQL).
- Analogy: A perfectly managed library with a strict card catalog system.
- Supports: MySQL, PostgreSQL, and SQL Server.
- When to use: For any application that requires structured data and strong consistency (ACID compliance), like e-commerce sites, financial applications, and CRM systems.
-
Cloud Spanner:
- What it is: A globally distributed, horizontally scalable relational database.
- Analogy: A chain of libraries that spans the entire globe but acts as a single, perfectly consistent library.
- When to use: For massive, global applications that need the structure of a relational database but also require limitless horizontal scalability and strong consistency across continents.
-
Firestore (and Firebase Realtime Database):
- What it is: A highly scalable, serverless NoSQL document database.
- Analogy: A massive, flexible filing system of digital documents (JSON-like) that is incredibly easy to search and syncs in real-time.
- When to use: Perfect for mobile and web applications. It's designed for ease of use, real-time data synchronization, and offline support.
-
Bigtable:
- What it is: A massive-scale, wide-column NoSQL database.
- Analogy: An industrial-scale warehouse designed to store and retrieve trillions of individual data points at extreme speeds.
- When to use: For huge analytical and operational workloads, like IoT time-series data, financial market data, or large-scale personalization engines.