logo

Kubernetes - PodSandbox

The CRI (Container Runtime Interface) Layer

"Pod" is not a native concept in core containerd.

While the core of containerd deals with containers, it has a CRI Plugin (cri-containerd). This plugin is what talks to Kubernetes (the Kubelet). The CRI explicitly defines two different types of objects:

  1. PodSandboxes
  2. Containers

When you deploy a Pod in Kubernetes, the Kubelet sends two separate types of commands to containerd.

Using gVisor (runsc)

Step 1: RunPodSandbox

The Kubelet tells containerd: "I need a new Pod Sandbox with this ID (e.g., Pod_123)."

  • containerd calls gVisor (runsc).
  • gVisor starts the Sentry process.
  • In gVisor's world, this Sentry process is the sandbox. It starts waiting for instructions but isn't running your application code yet.
  • At this stage, a "Pause" Gofer is usually started just to provide the basic infrastructure (like /dev or /proc) for the sandbox itself.

Step 2: CreateContainer

The Kubelet then tells containerd: "Now, create Container A and put it inside Sandbox Pod_123."

  • containerd looks up the existing sandbox (Pod_123).
  • It calls runsc again, but this time it passes a specific flag or uses the Runtime Handler to say: "This is not a new sandbox; this is a container belonging to the existing Sentry Pod_123."
  • gVisor then:
    1. Starts a new Gofer specifically for Container A's root filesystem.
    2. The Sentry (which was already running) opens a connection to this new Gofer.
    3. The Sentry creates a new internal "Task" (a process) using the files provided by that Gofer.

How they talk: The Shim (Runtime V2)

The real "magic" happens in a component called the Shim.

In modern Kubernetes setups, containerd doesn't call the runsc binary directly for every single action. Instead, it uses containerd-shim-runsc-v1.

  1. First Container: containerd starts one Shim. That Shim starts the Sentry.
  2. Second Container: When a second container is added to the same Pod, containerd recognizes it belongs to the same Sandbox ID. Instead of starting a new Shim, it sends a message to the existing Shim.
  3. The Shim then tells the Sentry (which it is already managing) to: "Hey, start a new process and connect it to this new Gofer I just launched for you."

Why gVisor needs this (The "Shared Kernel" Model)

This architecture is what allows gVisor to mimic a real Pod:

  • Shared Network Stack: Because there is only one Sentry for the Pod, all containers in that Pod share the same network stack (they can talk to each other on localhost).
  • Isolated Filesystems: Because there is one Gofer per container, Container A cannot see Container B's files (unless they share a volume), because the Sentry uses two different Gofer connections with different host permissions.

Using Standard Runtime (runc)

When using the standard runc runtime used by Docker and Kubernetes, the "PodSandbox" isn't a virtual kernel.

Instead, the PodSandbox is a collection of shared Linux Namespaces held open by a tiny, "do-nothing" process called the Pause Container.

The "Pause" Container

In a standard Linux container environment, a "Sandbox" is just a boundary. To create this boundary before the actual application containers start, Kubernetes (via containerd/runc) starts a hidden container first.

  • Image: Usually registry.k8s.io/pause:3.x
  • Role: It does absolutely nothing. It starts, and then it goes to sleep (calls pause()).
  • Purpose: It serves as the "anchor" for the namespaces that all other containers in the Pod will join.

How the Sandbox is constructed

When you create a Pod with two containers (App A and App B) using standard runc, here is what happens:

  1. Create the Sandbox: containerd tells runc to start the Pause Container.
    • runc creates a new set of Namespaces (Network, IPC, UTS).
    • The Pause Container "owns" these namespaces.
  2. Add Container A: containerd tells runc to start App A.
    • Crucially, it tells runc: "Do not create a new Network or IPC namespace. Instead, join the namespaces owned by the Pause Container."
  3. Add Container B: containerd tells runc to start App B.
    • Again, it joins the namespaces of the Pause Container.

What is actually "Shared" in this Sandbox?

When people say a Pod shares a "Sandbox," they specifically mean these Linux features:

  • Network Namespace: This is why containers in the same Pod can talk to each other on localhost. They are all looking at the same virtual network interface provided by the sandbox.
  • IPC Namespace: They can use System V IPC or POSIX message queues to talk to each other.
  • UTS Namespace: They all see the same Hostname.
  • Cgroups: They are often grouped under a parent Cgroup for resource accounting (so the Kubelet can see the total memory used by the whole Pod).

Comparison: Standard vs. gVisor

Feature Standard Sandbox (runc) gVisor Sandbox (runsc)
The "Boundary" Linux Namespaces (Network, IPC, UTS) The Sentry (User-space Kernel)
Isolation Level Logical separation within the Host Kernel Strong separation via Virtual Kernel
The "Anchor" The Pause Process (holds namespaces) The Sentry Process (emulates syscalls)
Kernel Containers share the Host Linux Kernel Containers share the Sentry "Guest" Kernel
Filesystem Managed by Host via Mount Namespaces Managed by Gofers for each container

Why do we call it a "Sandbox" in both cases?

The term "Sandbox" is an abstraction used by the CRI (Container Runtime Interface).

  • To Kubernetes, it doesn't matter how the isolation is done. It just says: "Give me a sandbox with this ID and these network settings."
  • runc fulfills that request by setting up Namespaces.
  • gVisor fulfills that request by spinning up a Sentry.