Kubernetes - PodSandbox
The CRI (Container Runtime Interface) Layer
"Pod" is not a native concept in core containerd.
While the core of containerd deals with containers, it has a CRI Plugin (cri-containerd). This plugin is what talks to Kubernetes (the Kubelet). The CRI explicitly defines two different types of objects:
- PodSandboxes
- Containers
When you deploy a Pod in Kubernetes, the Kubelet sends two separate types of commands to containerd.
Using gVisor (runsc)
Step 1: RunPodSandbox
The Kubelet tells containerd: "I need a new Pod Sandbox with this ID (e.g., Pod_123)."
- containerd calls gVisor (
runsc). - gVisor starts the Sentry process.
- In gVisor's world, this Sentry process is the sandbox. It starts waiting for instructions but isn't running your application code yet.
- At this stage, a "Pause" Gofer is usually started just to provide the basic infrastructure (like
/devor/proc) for the sandbox itself.
Step 2: CreateContainer
The Kubelet then tells containerd: "Now, create Container A and put it inside Sandbox Pod_123."
- containerd looks up the existing sandbox (
Pod_123). - It calls
runscagain, but this time it passes a specific flag or uses the Runtime Handler to say: "This is not a new sandbox; this is a container belonging to the existing SentryPod_123." - gVisor then:
- Starts a new Gofer specifically for Container A's root filesystem.
- The Sentry (which was already running) opens a connection to this new Gofer.
- The Sentry creates a new internal "Task" (a process) using the files provided by that Gofer.
How they talk: The Shim (Runtime V2)
The real "magic" happens in a component called the Shim.
In modern Kubernetes setups, containerd doesn't call the runsc binary directly for every single action. Instead, it uses containerd-shim-runsc-v1.
- First Container:
containerdstarts one Shim. That Shim starts the Sentry. - Second Container: When a second container is added to the same Pod,
containerdrecognizes it belongs to the same Sandbox ID. Instead of starting a new Shim, it sends a message to the existing Shim. - The Shim then tells the Sentry (which it is already managing) to: "Hey, start a new process and connect it to this new Gofer I just launched for you."
Why gVisor needs this (The "Shared Kernel" Model)
This architecture is what allows gVisor to mimic a real Pod:
- Shared Network Stack: Because there is only one Sentry for the Pod, all containers in that Pod share the same network stack (they can talk to each other on
localhost). - Isolated Filesystems: Because there is one Gofer per container, Container A cannot see Container B's files (unless they share a volume), because the Sentry uses two different Gofer connections with different host permissions.
Using Standard Runtime (runc)
When using the standard runc runtime used by Docker and Kubernetes, the "PodSandbox" isn't a virtual kernel.
Instead, the PodSandbox is a collection of shared Linux Namespaces held open by a tiny, "do-nothing" process called the Pause Container.
The "Pause" Container
In a standard Linux container environment, a "Sandbox" is just a boundary. To create this boundary before the actual application containers start, Kubernetes (via containerd/runc) starts a hidden container first.
- Image: Usually
registry.k8s.io/pause:3.x - Role: It does absolutely nothing. It starts, and then it goes to sleep (calls
pause()). - Purpose: It serves as the "anchor" for the namespaces that all other containers in the Pod will join.
How the Sandbox is constructed
When you create a Pod with two containers (App A and App B) using standard runc, here is what happens:
- Create the Sandbox:
containerdtellsruncto start the Pause Container.runccreates a new set of Namespaces (Network, IPC, UTS).- The Pause Container "owns" these namespaces.
- Add Container A:
containerdtellsruncto start App A.- Crucially, it tells
runc: "Do not create a new Network or IPC namespace. Instead, join the namespaces owned by the Pause Container."
- Crucially, it tells
- Add Container B:
containerdtellsruncto start App B.- Again, it joins the namespaces of the Pause Container.
What is actually "Shared" in this Sandbox?
When people say a Pod shares a "Sandbox," they specifically mean these Linux features:
- Network Namespace: This is why containers in the same Pod can talk to each other on
localhost. They are all looking at the same virtual network interface provided by the sandbox. - IPC Namespace: They can use System V IPC or POSIX message queues to talk to each other.
- UTS Namespace: They all see the same Hostname.
- Cgroups: They are often grouped under a parent Cgroup for resource accounting (so the Kubelet can see the total memory used by the whole Pod).
Comparison: Standard vs. gVisor
| Feature | Standard Sandbox (runc) |
gVisor Sandbox (runsc) |
|---|---|---|
| The "Boundary" | Linux Namespaces (Network, IPC, UTS) | The Sentry (User-space Kernel) |
| Isolation Level | Logical separation within the Host Kernel | Strong separation via Virtual Kernel |
| The "Anchor" | The Pause Process (holds namespaces) | The Sentry Process (emulates syscalls) |
| Kernel | Containers share the Host Linux Kernel | Containers share the Sentry "Guest" Kernel |
| Filesystem | Managed by Host via Mount Namespaces | Managed by Gofers for each container |
Why do we call it a "Sandbox" in both cases?
The term "Sandbox" is an abstraction used by the CRI (Container Runtime Interface).
- To Kubernetes, it doesn't matter how the isolation is done. It just says: "Give me a sandbox with this ID and these network settings."
- runc fulfills that request by setting up Namespaces.
- gVisor fulfills that request by spinning up a Sentry.