gVisor

gVisor is an open-source "application kernel" developed by Google. It provides a layer of isolation between a containerized application and the host operating system kernel, designed to make containers as secure as virtual machines (VMs) without the heavy resource overhead.

In the context of AI, gVisor is increasingly vital for running untrusted models, executing AI-generated code, and securing multi-tenant GPU environments.

How gVisor Works

In a standard container (like Docker), the application shares the host's Linux kernel. If the application is compromised, it could exploit a kernel vulnerability to "break out" and take over the entire server. gVisor prevents this by intercepting system calls.

The Core Components:

The Sentry (The "User-Space Kernel"): This is the heart of gVisor. Written in Go (a memory-safe language), it acts as a fake kernel that lives in user space. When an application tries to make a system call (like open() a file or network() a socket), the Sentry intercepts it and handles it internally rather than letting it reach the host kernel.
The Gofer (File System Proxy): To keep the Sentry even more isolated, it isn't allowed to access the file system directly. Instead, it talks to a separate process called the Gofer, which fetches files on its behalf.
Platforms (Interception): gVisor uses different mechanisms (like ptrace or KVM) to redirect system calls from the application to the Sentry.

The result: The host kernel only sees a small, predictable set of system calls from gVisor itself, rather than the hundreds of potentially dangerous calls from the untrusted application.

Where is it used?

gVisor is used in environments where security and multi-tenancy are priorities:

Google Cloud: It powers GKE Sandbox, more details below.
SaaS Platforms: Companies that let users upload and run their own code (like CI/CD pipelines or online IDEs) use gVisor to ensure one user's code can't spy on another's.
Financial & Healthcare: Organizations running sensitive workloads in Kubernetes use it as an extra layer of "defense in depth."

gVisor in Google Cloud

While Cloud Run v2 no longer uses gVisor, gVisor still powers the "Defense in Depth" for several other services:

Cloud Run (First Generation): If you specifically select the "First Generation" execution environment in Cloud Run settings, your code still runs inside gVisor.
App Engine Standard Environment: Most runtimes (Python, Node.js, Go, etc.) in App Engine Standard use gVisor to isolate user code from the host.
GKE Sandbox: This is the most common place for developers to use gVisor manually. It allows you to run a "Sandboxed" node pool where every Pod is wrapped in gVisor.
Cloud Functions (1st Gen): The original Cloud Functions architecture uses gVisor. (2nd Gen Cloud Functions are built on Cloud Run v2 and thus use microVMs).

Why Cloud Run v2 moved away from gVisor?

The reason Cloud Run v2 moved to microVMs is compatibility. gVisor has to "re-implement" every Linux system call in Go; if your app uses a niche system call that gVisor hasn't written yet, the app crashes. MicroVMs run a real Linux kernel, so they "just work."

The Big Shift: gVisor in AI

gVisor has found a massive second life in the AI world because it solves a specific problem that microVMs struggle with: GPU-accelerated sandboxing.

A. GKE Sandbox for AI Agents

Google recently introduced "GKE Sandbox for Agents." When an AI agent (like a LangChain agent) generates and executes Python code, that code is "untrusted." If it runs on a standard container, a "hallucinated" or malicious command could potentially escape to the host.

The AI Use Case: GKE uses gVisor to create an ephemeral sandbox for that specific code execution. If the AI-generated code tries to wipe the server or scan the internal network, gVisor intercepts the system calls and blocks them.

B. GPU Isolation (`nvproxy`)

Until recently, you couldn't easily use a GPU inside a gVisor sandbox because GPUs require direct kernel access. Google solved this with nvproxy:

How it works: gVisor intercepts CUDA and NVIDIA driver calls from the application and proxies them safely to the host GPU driver.
Why it matters for AI: This allows multi-tenant AI platforms to share expensive GPUs between different customers. Each customer’s training or inference job is isolated by gVisor, preventing one user from accessing another user’s model weights or data in GPU memory.

C. Industry Usage (OpenAI & Anthropic)

Because gVisor is open-source, it is used heavily outside of Google Cloud by the world's leading AI companies:

OpenAI: Uses gVisor for high-risk tasks, specifically sandboxing code execution within ChatGPT (e.g., the Advanced Data Analysis feature).
Anthropic: Is a major contributor to the gVisor project and uses it to isolate their internal AI research environments.

Are processes in gVisor sandboxes visible in the host?

No.

If you run a Python app in a standard container (runc), you can see python in the host's ps aux output. When using gVisor, you will never see "python" on the host.

What you see on the Host

If you run ps aux | grep runsc on your host while a gVisor container is running, you will see two main types of real Linux processes:

The Sentry (runsc-sandbox): This is the "Guest Kernel." To the host, your entire application—no matter how many processes or threads it has—is just one single process (the Sentry).
The Gofer (runsc-gofer): This is the file-system proxy. It is another real Linux process that sits next to the Sentry to handle file I/O.

Are the "App" processes real?

Inside the container, if you type ps aux, you might see:

PID 1: python app.py
PID 10: sidecar-helper

On the host, these do not exist as Linux processes.

The Sentry acts as a "Virtual Machine" for processes. It manages its own internal "Process Table." When your Python app starts a new thread or a new process, the Sentry simply allocates some internal memory and tracks a new "Task" inside its Go code.

To the Host Kernel: All those Python threads are just threads belonging to the Sentry process. The host kernel has no idea that "Python" is running; it only knows that the "Sentry" (a Go program) is very busy.

Then what are they?

To understand what they are, we have to look at the difference between Logic (how gVisor tracks them) and Execution (how the host runs them).

The short answer is: Logic-wise, they are Go structs; Execution-wise, they are Host Linux Threads.

Inside the Sentry (which is written in Go), there is no fork() or clone() that talks to the host. Instead, gVisor has a struct called a Task.

When your app calls clone() to create a new thread, gVisor simply creates a new Task object in its memory.
This object contains the virtual registers (RIP, RAX, etc.), the virtual stack pointer, and the virtual PID.
Are they Goroutines? No. While gVisor is written in Go, it cannot use standard Goroutines to run your application code. Goroutines are managed by the Go runtime and are "cooperatively scheduled." An application thread needs to be "preemptively scheduled" and have direct access to CPU registers, which Go's scheduler doesn't allow for external code.

The Execution: They are Host Linux Threads. To actually execute the code inside that Task object, gVisor needs a "vehicle" that the Host Kernel understands. That vehicle is a Host Linux Thread.

gVisor uses a pool of host threads. When a "Task" (your app's thread) is ready to run:

gVisor picks a Host Linux Thread.
It uses runtime.LockOSThread() (a Go command) to "lock" that host thread so the Go scheduler doesn't move it.
It then uses the Platform (KVM, Ptrace, or Systrap) to "jump" the CPU into the application's code.

The Mapping: Is it 1
?

This depends on which Platform you are using:

Ptrace Platform: Usually 1
. For every thread your application creates inside the sandbox, gVisor creates one real Linux thread on the host. This is because the ptrace system call is tied to specific thread IDs.
KVM Platform: Usually 1
. If you tell gVisor it has 4 vCPUs, it will create 4 host threads (acting as virtual CPUs). It then "multiplexes" your application's threads across those 4 host threads, similar to how a real physical CPU works.

Why this matters: The "Context Switch"

This explains why gVisor has a performance overhead. Look at what happens when your app makes a syscall:

App Thread (Running on a Host Thread) hits a SYSCALL instruction.
The CPU traps (stops).
The Host Thread switches from "Guest Mode" (running your app) back to "Host Mode" (running gVisor's Go code).
The Sentry (Go code) looks at the registers, realizes it’s a read() call, and handles it.
The Host Thread then "jumps" back into "Guest Mode" to continue your app.

In a Goroutine, this switch is just a function call. In gVisor, this switch is a full CPU context transition, which is much "heavier."

Why is it done this way? (The Security Win)

This is the core of "Defense in Depth."

If you use a standard container and the app finds a bug in the Linux kernel's clone() (process creation) logic, it can attack the host directly because the host kernel is managing that process.

In gVisor, because the application processes are just "data structures" inside the Sentry's memory:

The app cannot see other processes on the host.
The app cannot use host-level process signals (like kill) against host processes.
If the app "escapes" its internal process boundaries, it only escapes into the Sentry, which is itself a restricted, unprivileged process on the host.

How the Sentry handles a kill -9 sent from the host to the runsc-sandbox process?

When you send a kill -9 (SIGKILL) to a process, you are using the "Nuclear Option" of the Linux kernel. To understand how gVisor handles this, you have to remember one absolute rule of Linux:

No process can "handle," catch, or ignore a kill -9.

Unlike kill -15 (SIGTERM), which is a polite request to "please shut down," SIGKILL is an order to the Host Kernel to immediately delete the process from the CPU and RAM.

The Sentry Dies Instantly

The Host Linux Kernel receives the SIGKILL signal aimed at the Sentry's PID.

The kernel immediately stops the Sentry's execution.
It reclaims all the RAM the Sentry was using.
The Sentry does not get a single CPU cycle to clean up. It cannot save state, it cannot close files gracefully, and it cannot notify the application inside.

The Application Inside Vanishes

Because the application's "processes" were actually just data structures and threads inside the Sentry's memory space, they disappear the exact microsecond the Sentry dies.

To the application (e.g., a Python script or a Database), it is as if the universe suddenly ceased to exist.
If the app was in the middle of writing a file, that file might be corrupted or left partially written, because the Sentry didn't have time to "flush" the data to the host.

The Gofer Detects the "Broken Pipe"

The Gofer (the filesystem proxy) is a separate process. It usually communicates with the Sentry via a Unix Domain Socket or a Pipe.

Once the Sentry is killed, the Gofer's connection to it is "broken" (the host kernel closes the Sentry's end of the pipe).
The Gofer sees the error ECONNRESET or an End-of-File (EOF).
The Gofer, realized its "master" is gone, will then exit on its own.

The Shim Reports the Tragedy

The containerd-shim is sitting outside the sandbox, watching.

It sees that its child process (the Sentry) has exited with Exit Code 137.
- Calculation: In Linux, an exit code of 128 + Signal Number means the process was killed by a signal. $128 + 9 = 137$ .
The Shim reports this code back to containerd and Kubernetes.
In your kubectl get pods output, you will see the status: Terminated: Error (Exit Code 137).

What if you used `kill -15` (SIGTERM)?

If you sent a polite kill -15 instead of -9:

The Sentry would catch the signal.
The Sentry would "forward" that signal into the sandbox to the application's PID 1.
The Sentry would wait for the application to shut down gracefully (closing DB connections, finishing writes).
Once the app exits, the Sentry would shut itself down cleanly.

Which syscalls are supported?

The source of truth is pkg/sentry/syscalls/linux/linux64.go