logo

GCP - Open Source Dependencies

Google Cloud Platform (GCP) has a unique relationship with open source. Many of its core services are "managed versions" of open-source projects that Google either created and donated to the community (like Kubernetes) or heavily adopted to ensure industry compatibility.

Here is a list of the primary open-source projects that GCP depends on, categorized by how they are used.

1. The Core Infrastructure & Orchestration

These projects form the literal backbone of how GCP manages containers and serverless workloads.

  • Kubernetes (K8s): The foundation of Google Kubernetes Engine (GKE). Google originally built Borg (internal), rewritten as Kubernetes, and donated it to the CNCF.
  • Knative: The underlying technology for Cloud Run. It handles the "scale-to-zero" logic for serverless containers.
  • Istio: Powers Anthos Service Mesh and Cloud Service Mesh. It manages traffic, security, and observability between microservices.
  • Linux Kernel: Almost all GCP compute instances (Compute Engine) and managed services run on highly optimized versions of the Linux kernel.

2. Networking and Communication

GCP relies on these protocols to allow different services and APIs to talk to each other.

  • gRPC: A high-performance RPC framework. Nearly every internal and external Google Cloud API uses gRPC for communication.
  • Protocol Buffers (Protobuf): The language-neutral mechanism for serializing structured data used by GCP APIs.
  • Envoy: A high-performance proxy. Google uses Envoy extensively within its Cloud Load Balancing stack and Service Mesh.
  • OpenSSL / BoringSSL: Google maintains BoringSSL (a fork of OpenSSL) to handle the encryption (TLS/SSL) for almost all GCP traffic.
  • Cilium eBPF library: used by the GKE networking team for the Dataplane v2 effort.

3. Data Processing and Analytics

Many of GCP’s "Big Data" services are direct implementations of Apache Software Foundation projects.

  • Apache Beam: The programming model for Cloud Dataflow. Google donated the Dataflow SDK to the Apache Foundation to create Beam.
  • Apache Airflow: The engine behind Cloud Composer. It is used to author and schedule complex workflows.
  • Apache Spark / Hadoop / Hive: These form the core of Dataproc, which is GCP’s managed big data cluster service.
  • Apache Kafka: GCP recently launched a Managed Service for Apache Kafka, and it also provides connectors for Kafka to interact with Pub/Sub.
  • Apache Arrow: Used internally to speed up data transfers between BigQuery and other analytics tools.

4. Databases and Caching

GCP offers managed versions of these popular open-source storage engines.

  • PostgreSQL & MySQL: The primary engines offered via Cloud SQL.
  • Redis & Memcached: The engines that power Memorystore, GCP's in-memory data store.
  • Prometheus: Google Cloud’s Managed Service for Prometheus allows users to monitor GKE workloads using the industry-standard Prometheus open-source project.

5. AI and Machine Learning

The AI services in GCP (Vertex AI) are built to support the most popular open-source data science frameworks.

  • TensorFlow: Originally developed by Google Brain, it is the native language of many GCP AI services and TPUs (Tensor Processing Units).
  • PyTorch / JAX: Heavily supported and optimized for use on Vertex AI and Deep Learning VMs.
  • Kubeflow: Used to build and deploy ML pipelines on top of Kubernetes.

6. Security and Policy

GCP uses these to ensure "Zero Trust" and secure resource management.

  • Open Policy Agent (OPA) & Gatekeeper: As discussed previously, these are used in GKE Policy Controller to enforce compliance.
  • SPIFFE / SPIRE: Used for "Workload Identity," allowing different services to prove who they are without using static passwords or keys.
    • The Agent Identity feature in Vertex AI Agent Engine (Reasoning Engine) is built upon the SPIFFE framework.
    • SPIRE is an open-source implementation of the SPIFFE APIs. GCP provides its own mechanisms for provisioning SPIFFE-based identities
  • Tink: A multi-language cryptography library used across GCP SDKs to ensure encryption is implemented correctly.
  • Falco: Often integrated into Google Cloud's security suite for runtime threat detection in containers.

7. Management and Observability

The tools developers use to interact with GCP are largely open-source.

  • Terraform: While owned by HashiCorp, GCP depends on the open-source Google Cloud Provider for Terraform as the primary way customers manage "Infrastructure as Code."
  • OpenTelemetry (OTel): The industry standard for traces, metrics, and logs. Google is a top contributor, and Cloud Trace and Cloud Monitoring are fully compatible with OTel.
  • Go (Golang): Much of GCP’s internal tooling and the Cloud SDK (gcloud CLI) are built using the Go programming language.

Summary: The "Managed Service" Loop

A recurring pattern in GCP is:

  1. Google creates a tool (Kubernetes, Beam, gRPC, TensorFlow).
  2. Google open-sources it and donates it to a foundation (CNCF, Apache).
  3. GCP sells a "Managed" version of it (GKE, Dataflow, Vertex AI).

This allows GCP to benefit from the community's bug fixes and integrations while ensuring that customers are not "locked in" to a proprietary technology that only exists on Google Cloud.