gVisor - runsc

runsc (short for "run Sandboxed Container") is the command-line executable and container runtime that powers gVisor.

If you think of gVisor as the "engine," runsc is the "interface" that allows tools like Docker and Kubernetes to use that engine. It is an OCI-compliant runtime, meaning it follows the same industry standards as runc (the default runtime used by Docker).

The "Drop-in Replacement"

The primary goal of runsc is to be a plug-and-play replacement for runc.

Standard Docker: Uses runc to create containers using standard Linux namespaces and cgroups (shared kernel).
gVisor Docker: Uses runsc to create containers where the application thinks it’s talking to a Linux kernel, but is actually talking to the gVisor Sentry.

Because it is OCI-compliant, you can run a gVisor container just by changing a flag:

docker run --runtime=runsc -it ubuntu bash

What happens when you execute `runsc`?

When you start a container with runsc, it performs several high-level tasks:

Starts the Sentry: It launches the Sentry process, which is the "user-space kernel" written in Go. This Sentry will intercept all system calls from the application.
Starts the Gofer: It launches a "Gofer" process. Since the Sentry is not allowed to touch the host's file system directly for security reasons, the Gofer acts as a file proxy, providing files to the Sentry via the 9P protocol.
Sets up Sandboxing: It configures the "Platform" (like Ptrace or KVM) to ensure that the application process is trapped and cannot execute code directly on the host kernel.
Applies Seccomp Filters: It wraps itself in a very tight Seccomp filter, telling the host kernel: "I (runsc) am only allowed to do a very small list of things. If I try to do anything else, kill me."

Architecture Overview

runsc manages the lifecycle of the two main components of a gVisor sandbox:

Sentry: The kernel. It handles syscalls, manages memory, and schedules threads.
Gofer: The file system mediator. It sits between the Sentry and the host's actual files.

Why is it called `runsc`?

The name follows the naming convention established by the Open Container Initiative:

runc: The original run Container (the reference implementation).
runsc: run Sandboxed Container.

Integration with Kubernetes

In a Kubernetes environment, runsc is typically used via containerd and CRI-O. You define a RuntimeClass in Kubernetes:

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: gvisor
handler: runsc

Once this is set up, any Pod that specifies runtimeClassName: gvisor will be managed by runsc, providing that Pod with a higher level of security isolation.

Souce code: Why runsc is in its own top level folder?

In Go project architecture, the location of a folder tells you more about how the code is used rather than how "important" it is.

The fact that runsc is in a top-level folder doesn't mean it isn't "core"; it means runsc is a binary (a command-line tool), while things inside pkg/ are libraries.

The "Command vs. Library" Pattern

In the Go ecosystem, there is a common convention:

pkg/: Contains the reusable logic. This is the "meat" of the project. If you were writing another tool that needed gVisor's features, you would import things from pkg/.
Top-level folders (or cmd/): These contain the main packages. These are the actual programs that you compile into an executable file.

runsc is the entry point. When you run go build ./runsc, you get the actual file that you put in /usr/local/bin/runsc. If runsc were buried deep inside pkg/, it would be harder to find and build as a standalone tool.

What is actually inside `runsc/`?

If you look inside the runsc/ directory, you will see code related to:

CLI Flag Parsing: Handling commands like run, boot, delete, list.
OCI Compatibility: Code that makes sure gVisor follows the Open Container Initiative specs.
Sandbox Orchestration: The logic that decides how to launch the Sentry process and the Gofer process and how to connect them.

Other Top-Level Folders

You will notice other top-level folders that follow the same pattern:

shim/: This is the "containerd-shim." It's another binary used to integrate gVisor deeply into containerd.
webhook/: A binary used for Kubernetes admission controllers.

Are Gofers and Sentry separate binaries?

No. There's only one binary: runsc. Sentry and Gofers are subprocesses of it.

When you start a container, the host sees a tree of processes all originating from the runsc binary.

Here is the hierarchy:

runsc (The Parent/Manager): This is the entry point (the "shim") that stays alive to manage the container's lifecycle.
Sentry (Subprocess): runsc spawns a process (often via runsc boot) that acts as the Sentry. This is the isolated kernel where your app actually runs.
Gofer (Subprocess): runsc also spawns one or more processes (via runsc gofer) to act as the Gofer. These sit between the Sentry and the host's actual files.

So, while they are all the same file on your disk, they are separate processes in your memory with different permissions.

runsc Summary

It is the CLI tool for gVisor.
It is OCI-compliant, making it compatible with Docker and Kubernetes.
It launches and manages the Sentry and Gofer processes.
It provides the "Sandbox," ensuring that even if an application is compromised, it cannot break out into the host kernel.

CLI Options

When you use runsc (the gVisor runtime), you aren't just starting a container; you are starting a guest kernel. Because the Sentry (gVisor's kernel) is doing so much work to emulate Linux, these flags are essential for understanding why an application might be failing or performing poorly.

`--debug` and `--debug-log`

These flags tell the Sentry process itself to start talking.

--debug: Enables internal logging for the gVisor runtime. This isn't your application's logs; it is the Sentry’s log. It will show you things like memory mapping, internal thread scheduling, and the Sentry's initialization steps.
--debug-log: This points to a directory where gVisor will dump its logs.
- Crucial Detail: Because gVisor starts multiple processes (the Sentry, the Gofer, etc.), it will create multiple files in this directory, one for each process.

Example usage:

runsc --debug --debug-log=/tmp/gvisor-logs/ run my-container

`--strace` (for syscalls)

This is arguably the most powerful tool in the gVisor arsenal. In a normal system, you use strace to see syscalls between an app and the Linux kernel. In gVisor, the Sentry is the kernel, so it has a built-in strace feature.

What it does: It captures every system call the application makes to the Sentry and logs it to the debug log.
Why use it? If an app works in Docker/runc but fails in gVisor, it’s usually because the Sentry hasn't implemented a specific flag for a syscall yet. --strace will show you exactly which syscall returned ENOSYS (Function not implemented).

`--strace-syscalls` (The Filter)

A full strace is a "firehose" of data. If your app is doing thousands of read operations, the logs will be impossible to read.

What it does: It limits the strace to specific syscalls.
Example: --strace --strace-syscalls=open,socket,connect
- This will only show you file opening and network activity, ignoring everything else.

`--log-packets` (The Network Sniffer)

Since gVisor has its own internal network stack (Netstack), standard host tools like tcpdump sometimes don't see the internal logic of how gVisor is handling a packet.

What it does: It logs the contents of every network packet (Ethernet, IP, TCP/UDP) that passes through the Sentry.
Use case: Debugging complex networking issues inside the sandbox where you suspect the gVisor Netstack might be dropping packets.

`--platform` (The Interception Logic)

As we discussed earlier, gVisis can intercept syscalls using different "platforms." This flag lets you choose:

--platform=ptrace: Uses the ptrace syscall. Slow, but works everywhere (even inside another VM).
--platform=kvm: Uses hardware virtualization. Much faster, but requires /dev/kvm access.
--platform=systrap: The modern default that uses seccomp and shared memory.

`--network` (Isolation Level)

--network=sandbox (Default): The Pod gets its own isolated network stack (Netstack). Very secure.
--network=host: The Pod shares the host's network stack. Less secure, but higher performance.

How to set these in Kubernetes/Containerd

Most people don't run runsc manually. They use containerd. To enable these flags in a K8s cluster, you must edit the containerd configuration:

/etc/containerd/config.toml

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runsc]
  runtime_type = "io.containerd.runsc.v1"
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runsc.options]
    # You add the flags here
    BinaryName = "/usr/local/bin/runsc"
    Root = "/run/containerd/runsc"
    # Example: Enabling strace for all gVisor pods
    ExecutionOpts = ["--strace", "--debug", "--debug-log=/var/log/gvisor"]

Log file suffix

When you enable debugging in runsc (gVisor) using the --debug-log flag, gVisor doesn't create just one log file. Instead, it creates a separate log file for every process and every major OCI lifecycle command.

This is because runsc follows the OCI (Open Container Initiative) standard, which breaks container execution into distinct phases. Since gVisor is a "sandbox," it also starts auxiliary processes (like the Gofer) that need their own logs.

The Common Pattern

The files are usually named like this: runsc.log.20240520-103000.123456.boot (Format: [Tool].[Date]-[Time].[PID].[Command])

Suffix Meanings

Order: .create (Sandbox is built) => .boot (Sentry/Kernel is up) => .start (the "App" is running).

`.create`

This log covers the initial setup of the container.

What’s inside: It tracks the creation of the sandbox environment, setting up Linux namespaces (network, UTS, etc.), and configuring cgroups.
Why check it: If your container fails to even "exist" (e.g., a "RuntimeError" during creation), look here.

`.boot` (The Most Important File)

In gVisor, the .boot log belongs to the Sentry (the guest kernel).

What’s inside: This is the most detailed log. It shows the gVisor kernel "booting" up. It includes:
- Memory management initialization.
- Loading the application binary.
- Syscall Interception: Every time your app makes a syscall that gVisor doesn't support, it will be logged here (look for ENOSYS).
Why check it: This is the primary file for debugging why an application is crashing or misbehaving inside the sandbox.

`.start`

This log tracks the transition from a "Created" state to a "Running" state.

What’s inside: It captures the moment the Sentry actually begins executing the application's entrypoint code.
Why check it: If your container creates successfully but crashes the split-second it tries to run.

`.gofer`

The Gofer is a separate process in gVisor that acts as a file-system proxy.

What’s inside: Since the Sentry is not allowed to talk to the host filesystem directly for security reasons, it asks the Gofer to do it. This log tracks file opening, reading, and permission checks.
Why check it: If you are getting "Permission Denied" or "File Not Found" errors even though the files exist on the host, the Gofer log will tell you why.

`.delete` / `.kill`

What’s inside: These capture the cleanup phase. They show gVisor tearing down the namespaces, releasing memory, and shutting down the Sentry and Gofer processes.

Why are they split up?

OCI Compliance: Tools like containerd call runsc multiple times (once for create, once for start, etc.). Each call is a new process with a new PID, so it gets its own log file.
Security Isolation: The Sentry and the Gofer are separate processes with different permissions. By splitting the logs, gVisor makes it clear which process encountered an error.
Debugging Clarity: If the sandbox fails to boot, you don't want to dig through thousands of lines of filesystem logs (Gofer) to find the kernel error (Sentry).

How to use them effectively

If you have a crashing container, follow this order:

Check .boot first: Look for "panic," "fatal," or specific syscall errors (ENOSYS).
Check .gofer second: If you suspect a file access issue.
Check .create third: If the container never reaches the "running" state.

Pro-Tip: Because there are so many files, use grep across the whole directory:

grep -rnE "error|panic|fatal|ENOSYS" /path/to/your/log/dir/

Runtime Monitoring

so "runsc trace create" does not create a new process, it just tell sentry to connect to the socket as a client?

Correct. When you run runsc trace create, you are simply issuing a management command to an already running sandbox.

The runsc trace create command itself is short-lived; it sends the configuration to the Sentry (the sandbox kernel) and then exits.

Upon receiving this instruction, the Sentry acts as a client and attempts to connect() to the Unix Domain Socket (UDS) path you provided.

gVisor - runsc

The "Drop-in Replacement"

What happens when you execute runsc?

Architecture Overview

Why is it called runsc?