GCP - Versus
Google Cloud Datastore vs Cloud Firestore vs Firebase Realtime Database
- Firestore is the successor of Datastore; Datastore is deprecated.
- Firebase Realtime Database was the original database offering for Firebase.
- Cloud Firestore, as the name suggests, is a "joint-venture" of Google Cloud and Firebase, it is the new flagship database offering of Firebase.
- Both Firestore and Realtime Database are NoSQL, using JSON as data format.
Container Registry vs Artifact Registry
Container Registry:
- GCR is a private Docker registry backed by Cloud Storage.
- container images only
gcr.io
Artifact Registry:
- a.k.a. GCR v2
- the recommended service for container image storage and management on Google Cloud.
- extends the capabilities of Container Registry.
- support for both container images and non-container artifacts:
- Container images: Docker, Helm
- Language packages: Java, Node.js, Python
- OS packages: Debian, RPM
pkg.dev
GKE vs Kubernetes
- Kubernetes is a piece of software, for container orchestration. You still need to acquire hardware, config and bootstrap the systems, install add-ons, monitor the systems, upgrade to keep the system up-to-date.
- GKE is a managed service that takes care of those tasks, and you can start utilizing the container orchestration functionality right away.
GKE vs Anthos (GDCV) vs GDCH
- GKE: all running on GCP.
- Anthos (GDCV): control plane running on GCP, data plane can be on other clouds (AWS, Azure, etc), VMWare, or Bare Metal. Your cluster needs to be connected to GCP all the time.
- GDCH: air-gapped, meaning there's no connection to GCP at any time, both control plane and data plane run on customer's hardware (reusing Anthos Bare Metal)
Cloud Run vs Cloud Functions vs App Engine
Google Cloud Platform has several serverless offerings:
- Cloud Run: for containers, essentially a managed Knative. Launched in 2019.
- If you worry about "vendor lock-in", Cloud Run is the best choice: code is packaged into standard (Docker) containers; and since Knative is an open source project, you can easily migrate to an on-prem environment or another cloud running Knative, without worrying about hidden differences under the hood.
- Cloud Functions: Functions as a Service.
- 1st gen launched in 2016.
- 2nd gen introduced in 2022, built on top of Cloud Run.
- Improvements: longer processing time for HTTP functions (up to 60 mins), enabling use cases like data processing pipelines and machine learning; reduces cold starts and latency; increases concurrency to up to 1000 requests per function instance; supports larger instances, up to 16 GB memory and 4 vCPUs.
- uses Eventarc for event delivery, supporting triggers from different sources.
- App Engine: Platform as a Service. Launched in 2008, even before "serverless" became a buzzword. Probably not the best choice if you are starting a new development.
Cloud Identity vs IAM vs Identity Platform
- Cloud Identity: to manage identities of GCP customer's employees, and devices. The identity provider (IdP) for GCP, also the Identity-as-a-Service (IDaaS) solution that powers Google Workspace.
- Identity Platform: to manage identities of GCP customer's customers.
- IAM: to manage policies and to apply permissions / roles to those users and groups. Can use an external identity provider (IdP).
GCP API Gateway vs Cloud Endpoint
Both API Gateway and Cloud Endpoint are used to manage your APIs: you create APIs to be hosted on GCP and to be consumed by others.
API Gateway is a relatively new offering.
The key difference:
- API Gateway: fully managed by Google (based on Envoy), while
- Cloud Enpoint: you need to manage your own. Google just provides a software package called Extensible Service Proxy, or ESP. (ESP v1 based on NGINX and ESP v2 based on Envoy) and you need to deploy it.
Google Cloud API Gateway reduces the complexity of deploying and managing APIs, and it is comparable to Amazon API Gateway and Azure API Management. I would not be surprised if Cloud Enpoint is deprecated some day.
GCP DataFlow vs Dataproc
Google Cloud Platform has 2 data processing / analytics products:
- Cloud DataFlow is the productionisation, or externalization, of the Google's internal Flume.
- Cloud Dataproc is a hosted service of the popular open source projects in Hadoop / Spark ecosystem. They share the same origin (Google's papers) but evolved separately.
A little bit history
Hadoop was developed based on Google's The Google File System paper and the MapReduce paper. Hadoop got its own distributed file system called HDFS, and adopted MapReduce for distributed computing. Then Hive, Pig were created to translate (and optimize) the queries into MapReduce jobs. But still MapReduce is very slow to run. Then Spark was born to replace MapReduce, and also to support stream processing in addition to batch jobs.
Separately, Google created its internal data pipeline tool on top of MapReduce, called FlumeJava (not the same and Apache Flume), and later moved away from MapReduce. Another project called MillWheel was created for stream processing, now folded into Flume. Part of the Flume was open sourced as Apache Beam.
So both Flume and Spark can be considered as the next generation Hadoop / MapReduce.
Which one to use
- If you want to migrate from your existing Hadoop / Spark cluster to the cloud, or take advantage of so many well-trained Hadoop / Spark engineers out there in the market, choose Cloud Dataproc.
- If you trust Google's expertise in large scale data processing and take their latest improvements for free, choose Cloud DataFlow.
Marketplace services vs Managed services
- Marketplace services: user gets a container and a guide only, need to manage the whole lifecycle on their own; no SLA; no data backup.
- Managed services: all taken care of by GCP.
Asset vs Resource
Asset is a broader term, which includes resources (e.g. Compute Engine), policies (e.g. IAM Policy, Organization Policy, etc), runtime information (OS Inventory, etc), GKE Resources, other cloud (AWS, etc) resources.
Cloud Asset Inventory API (google.cloud.asset.v1
) can be used to access asset metadata.
Gemini vs Vertex AI
- Gemini is a family of large language models (LLM).
- Vertex AI is the infra, as part of GCP, to run models and for building AI applications.
- Gemini can be used in Vertex AI; Gemini can be accessed from https://gemini.google.com/; Gemini is also integrated with Google 1st party products like Google Workspace.
Service Account vs Service Agent
Service Agent is one kind of Service Account. Service Account can be:
- user-managed service account:
- managed by users.
- the identity of the workload in GCP.
- for user owned resources accessing other GCP resources.
- for example, user owns a GCE instance and a GCS bucket, a service account can be attached to the GCE instance, it will be used to authn and authz when writting to the GCE bucket.
- looks like
[email protected]
, 1234 is the project id, it is on the right side of@
, indicates that it is owned by this user project.
- service agent:
- managed by Google Cloud.
- allowing GCP services to access resources on your behalf.
- for example, user owns a GCS bucket; Cloud Logging want to write logs to the bucket, it can use a service agent to get access.
- looks like
[email protected]
, 5678 is the org id, it is on the left side of@
indicates it is not the owner; CIEM is on the right side, so this is a CIEM service agent for the org 5678.
Producer Project vs Consumer Project vs Tenant Project
- Producer Project: A project owned by the API producer team. It can be GCP service or a 3rd party service using Service Infrastructure.
- Consumer Project: a.k.a. customer project, user project; the project that is used to access a service API.
- Tenant Project: service producer owned project but contains customer resources. Not visible to consumers directly. A tenant project may be mapped to one or many consumer projects. For example, Cloud SQL is the service, but the underlying resources (like VMs) are provisioned in the tenant project, which are invisible to the customers.
Also there's host project and service projects: in the Shared VPC model, you designate a project as a host project and attach one or more other service projects to it. The VPC networks in the host project are called Shared VPC networks. Eligible resources from service projects can use subnets in the Shared VPC network. The host-project owns the networks in the VPC, which are then used by service projects.
Cloud SQL vs AlloyDB vs Cloud Spanner
There are multiple ways to have a PostgreSQL (compatible) database on Google Cloud Platform:
- Manual: Install and manage your own PostgreSQL instance on a VM or bare-metal machine.
- Cloud SQL: running the actual PostgreSQL, Cloud SQL runs database instances on GCE VMs, using PD for storage. Cloud SQL instances run on GCE VMs in a Google-owned project. The customer's database is stored on a GCE PD.
- AlloyDB: PostgreSQL compatible (i.e. it is NOT PostgreSQL), utilizing Google's infra, better performance and more expensive.
- Spanner: with PostgreSQL interfact but does not promise 100% compatible.
Note that only manually setup instances and Cloud SQL are running the "real" PostgreSQL; AlloyDB and Spanner are not real PostgreSQL, they are just compatible: AlloyDB is 100% compatible but Spanner is partially compatible.
Spanner provides strong consistency across all regions, whereas AlloyDB is eventually consistent due to the way it handles replication.
AlloyDB is comparable to Amazon Aurora.
Spanner vs Cloud Spanner
Spanner is the internal version and the Cloud Spanner is a service on GCP that can be used by GCP customers.
Under the hood they are using the same db but Cloud Spanner adds a layer that hides some internal features and adds Cloud specific features, so feature wise they are not identical, and neither is a superset.
Persistent Disk Snapshot vs Disk Image
Persistent Disk Snapshot
- Primary Purpose: Backup and Recovery. Snapshots are designed to create point-in-time backups of your persistent disks (either boot disks or data disks).
- Nature:
- Incremental: By default, after the first full snapshot of a disk, subsequent snapshots are incremental. They only store the blocks that have changed since the previous snapshot, making them faster to create and more storage-efficient.
- Point-in-Time: Captures the exact state of the disk blocks at the moment the snapshot creation begins.
- Source: Created from an existing Persistent Disk attached to a VM (the VM can be running or stopped).
- Use Cases:
- Regularly backing up critical data.
- Recovering a disk to a previous known good state (by creating a new disk from the snapshot).
- Migrating a disk's data to a new zone, region, or project (by creating a new disk from the snapshot in the target location).
- Protecting against accidental data deletion or corruption.
- Lifecycle: Often created on a regular schedule (e.g., daily, weekly) for ongoing protection. Older snapshots might be deleted based on a retention policy.
- Cost: You pay for the storage consumed by the snapshot data (which is often less than the source disk size due to incrementality and compression). Costs vary between standard, instant, and archive snapshot types.
Disk Image
- Primary Purpose: Template for creating new disks, especially boot disks for new VM instances.
- Nature:
- Self-Contained Template: Contains the entire filesystem and data necessary to create a new disk, often including a bootable operating system, configurations, and pre-installed software.
- Full Copy (Generally): Usually represents a full copy of the source data at the time of creation (though the underlying storage might be optimized).
- Source: Can be created from multiple sources:
- An existing Persistent Disk (boot or data).
- A Snapshot.
- Another Image.
- A raw disk file (e.g.,
disk.raw
) stored in Google Cloud Storage. - Virtual disk files (VMDK, VHD, etc.) imported from other platforms (like VMware, VirtualBox, AWS, Azure).
- Use Cases:
- Creating new VM instances with a pre-configured operating system and software stack ("golden image").
- Standardizing VM deployments across your organization.
- Sharing bootable environments with other projects or users (using IAM permissions).
- Importing existing virtual machines from on-premises or other clouds into Google Cloud.
- Lifecycle: Created when a new baseline or template is needed. Often versioned using "Image Families" to point to the latest recommended version.
- Cost: You pay for the storage consumed by the image size for the duration it's stored.
Here's a table summarizing the key differences:
Feature | Disk Snapshot | Disk Image |
---|---|---|
Primary Goal | Backup & Recovery | Template & Boot Source |
Nature | Incremental (usually), Point-in-time | Full Copy (usually), Self-contained |
Main Use | Restore disk, Create data copy | Create new boot/data disks for VMs |
Created From | Persistent Disk | Disk, Snapshot, Image, GCS File, Import |
Typical Data | Raw disk blocks at a specific time | Bootable OS, Apps, Configs (often) |
Lifecycle | Frequent, Scheduled (often) | Infrequent, As needed for new templates |
In essence:
- Use Snapshots for backing up your live disks regularly so you can recover from data loss or revert changes.
- Use Images to create standardized templates, especially boot disks, for launching new, pre-configured VM instances quickly and consistently.
CSEK vs CMEK
CSEK (Customer-Supplied Encryption Keys)
- Concept: You create and manage your own encryption keys entirely outside of Google Cloud. When you interact with supported GCP services (like reading or writing an object in Cloud Storage or attaching a Compute Engine disk), you provide your AES-256 key along with each API request.
- Key Management: You are fully responsible for generating, securing, storing, rotating, and backing up these keys externally.
- GCP Interaction: Google Cloud uses the key you provide only for the duration of the operation to encrypt or decrypt the data. Google does not store your key persistently on its servers; it only stores a hash of the key to validate future requests.
- Control: You have absolute control over the key material because it never permanently resides within GCP.
- Responsibility: You bear the complete responsibility. If you lose your key, the data encrypted with it becomes permanently irrecoverable.
- Supported Services: Primarily Cloud Storage objects and Compute Engine persistent disks/local SSDs. It's not widely supported across all GCP services (e.g., not supported by BigQuery or Cloud SQL).
- Use Case: Typically used when strict regulatory or policy requirements demand that encryption keys are never stored by the cloud provider and remain entirely under the customer's external control. It adds significant operational overhead.
CMEK (Customer-Managed Encryption Keys)
- Concept: You use Google Cloud Key Management Service (Cloud KMS) to create, manage, and control your encryption keys within Google Cloud. You then configure supported GCP services to use specific keys stored in Cloud KMS to protect their data at rest.
- Key Management: Keys are managed centrally through Cloud KMS. You control the key's lifecycle (creation, rotation schedule, disabling, destruction) and access permissions (using IAM). Google manages the underlying secure infrastructure of Cloud KMS.
- GCP Interaction: Supported GCP services (like Cloud Storage, BigQuery, Compute Engine, Cloud SQL, etc.) are configured to use a specific KMS key. When these services need to encrypt or decrypt data, they make an internal call to the Cloud KMS API using the key you designated, leveraging GCP's IAM controls. The raw key material generally doesn't leave the KMS boundary.
- Control: You manage the keys' policies and lifecycle via Cloud KMS within GCP. You can grant/revoke permissions and disable/destroy keys to control access to the data encrypted by them.
- Responsibility: This is a shared responsibility model. You manage the key policies and lifecycle in KMS; Google secures and manages the KMS infrastructure.
- Supported Services: Has broad support across many GCP services (Cloud Storage, Compute Engine disks/snapshots/images, BigQuery, Cloud SQL, Pub/Sub, Bigtable, Spanner, Artifact Registry, Cloud Logging, etc. - see the official list for details).
- Use Case: Used when you need more control over encryption keys than Google's default encryption provides, require centralized key management within GCP, need auditable key usage logs (via Cloud Audit Logs), or must meet compliance requirements mandating customer-managed keys (but allowing them to be managed within the cloud provider's KMS).
Key Differences Summarized:
Feature | CSEK (Customer-Supplied) | CMEK (Customer-Managed) |
---|---|---|
Key Location | Outside GCP (Managed by Customer) | Inside Google Cloud KMS (Managed by Customer) |
Key Management | Customer's full external responsibility | Via Cloud KMS within GCP |
GCP Stores Key? | No (only a hash for validation) | Yes (securely within Cloud KMS) |
Provided To GCP | With every API request needing the key | Configured once per resource; used via KMS API |
Service Support | Limited (mainly GCS objects, GCE disks) | Broad (many GCP services) |
Control Level | Absolute (external) | Granular (via KMS policies, IAM) within GCP |
Responsibility | Fully Customer | Shared (Customer manages key config, Google KMS infra) |
Key Loss Impact | Permanent data loss | Data inaccessible (until key restored/re-enabled) |
Integration | Operationally complex, per-request | Easier, integrated via resource config & KMS |
In short: Choose CSEK if you absolutely must keep your keys entirely outside Google Cloud and manage them yourself, accepting the operational complexity and limited service support. Choose CMEK if you need enhanced control, auditability, and centralized management of keys within GCP across a wide range of services, leveraging Cloud KMS.