GCP - Colab

To understand how Google Colab works, we have to distinguish between the Standard (Consumer) Colab and Colab Enterprise (which is part of Google Cloud's Vertex AI).

Under the hood, Colab is entirely based on Google Compute Engine (GCE). Every time you "Connect" to a runtime, a GCE virtual machine is being provisioned for you.

The Core Infrastructure: GCE

When you click Connect, the Colab backend triggers a request to the GCE API to spin up a VM instance.

The Image: These VMs run a specialized version of the Google Deep Learning VM Image. This image comes pre-installed with Linux (Ubuntu), Python, and hundreds of libraries like TensorFlow, PyTorch, and JAX.
Virtualization: Colab uses gVisor (a container sandboxing technology) or standard Docker containers on top of GCE to ensure that your code is isolated from other users' code, even if they are on the same physical hardware.
Hardware: Depending on your tier (Free vs. Pro vs. Enterprise), Colab attaches specific GCE hardware resources, such as NVIDIA T4/A100 GPUs or TPUs.

Are the VMs shared?

No. While Google manages a pool of these runtimes within its own Google Cloud Project, each time you connect to a managed runtime, a dedicated VM is provisioned for your use. This ensures isolation between different users' environments. Your code and processes run only within your assigned VM, separate from other users.

If so, would it take a long time to boot up the VM?

To provide a fast connection experience, Colab uses a pool of pre-created VMs. When a user initiates a session with a managed runtime, a VM from this pool is assigned to them.

When would the VM be deallocated?

VMs are reclaimed and recycled based on several factors:

Inactivity: VMs are typically deallocated after a period of user inactivity to free up resources. While the exact timeouts can vary and are managed by the service, idle runtimes do not persist indefinitely.
Session Limits: There are limits on the maximum duration a single session can run on a managed VM, after which the VM is recycled.
Resource Management: Colab's backend actively manages the pool of VMs. Systems are in place to detect and clean up "stale" VMs, as indicated by playbooks like External Stale VMs open_in_new, which mentions a "VM pool maintenance interval." This ensures efficient use of compute resources.

So my notebook may run on different vms from time to time.

Yes, in a Colab managed runtime, if your session is interrupted or ends, a new session will run on a different VM instance.

So the VM is not tied to my identity?

The GCE VMs in these managed runtimes operate using a service account controlled by the Colab team's GCP project. This service account is not directly tied to your individual Google user identity.

Then how does auth work?

When you, as a user, interact with Google Cloud services from within an Colab managed runtime (e.g., using gcloud, google-cloud-python client libraries, or google.colab.auth), the authentication process typically involves an OAuth flow.

You are prompted to authorize Colab to access GCP on your behalf. This authorization links your user identity to the runtime session.

Once authorized, the runtime environment can obtain access tokens that carry your permissions, allowing it to interact with GCP resources you have access to.

For authentication, use:

from google.colab import auth
auth.authenticate_user()

Once you finish the auth flow, the credential will be stored in the VM, you can verify the location by

import os
print(os.environ.get('GOOGLE_APPLICATION_CREDENTIALS'))

When your Python code (or any other application) in the Colab notebook calls a Google API (like BigQuery, Cloud Storage, or Vertex AI), the Google client libraries automatically follow a lookup process. On a GCE VM, the client libraries will look at the Metadata Server to fetch a short-lived access token corresponding to the VM's Service Account.

Standard Colab vs. Colab Enterprise

The main difference is who owns the GCE project.

Standard/Pro Colab (Consumer)

"Invisible" GCE: The VMs live in a Google-owned project, not your GCP project. You cannot see these VMs in your Google Cloud Console.
Ephemeral: When you disconnect or the timeout hit, the GCE instance is deleted. Any data not saved to Google Drive is lost.
Shared Pool: You are pulling from a massive "warm pool" of GCE instances to make the connection speed fast.

Colab Enterprise (Vertex AI)

"Visible" GCE: The VMs run inside your GCP project. You can see them in the GCE console.
Persistence: You can use "Runtime Templates." This allows you to specify exactly what machine type you want (e.g., an n1-standard-8 with 2 GPUs) and keep it running as long as you want.
VPC Integration: Because it is your GCE instance, the Colab notebook can "talk" to other resources in your VPC, like a private Cloud SQL database or a BigQuery instance, using internal IP addresses.

How the "Wiring" Works (The Data Path)

When you type print("Hello") and hit Shift+Enter, the flow looks like this:

Frontend (Browser): Your browser sends a JSON payload via WebSockets.
Colab Gateway: Google's proxy layer receives the request and routes it to the specific GCE instance assigned to your session.
The Kernel: Inside the GCE VM, a Jupyter Kernel (the Python execution engine) receives the code.
Execution: The kernel runs the code on the CPU/GPU.
Return: The output (text, images, or plots) is sent back through the gateway to your browser.

Storage and Data

Since GCE instances are essentially remote computers, storage works in three ways:

Local Disk: The GCE instance has a local boot disk (usually around 100GB). This is fast but temporary. If the VM is deleted, this data dies.
FUSE Mounts: When you "Mount Google Drive," Colab uses a technology called FUSE (Filesystem in Userspace) to make your Cloud-based Google Drive appear like a local folder (/content/drive) inside the GCE VM.
Cloud Storage (GCS): In the Enterprise version, it is common to use gsutil or the Python GCS client to stream data directly from buckets into the GCE VM's memory.

Can you use your own "External" GCE?

Yes. Colab has a feature called "Connect to a local runtime" or "Connect to a custom GCE VM."

You can spin up a high-end GCE instance manually in your console.
Install the jupyter_http_over_ws extension on it.
Tell the Colab frontend to use that specific IP address instead of Google's managed pool.
Result: You get the Colab UI (the nice notebook interface) but the raw power and persistent disk of your own dedicated GCE server.

Summary

Colab is essentially a managed SaaS wrapper around Google Compute Engine. It abstracts away the SSHing, the driver installation, and the networking setup, providing a "Serverless" experience for what is actually a traditional GCE Virtual Machine.