logo

gVisor - systrap

In gVisor, Systrap is the name of the Platform used to intercept system calls.

To understand Systrap, you have to understand the core challenge of gVisor: The Application thinks it is talking to a Linux Kernel, but gVisor needs to "hijack" those system calls and send them to the Sentry (the user-space kernel) instead.

The Context: What is a "Platform"?

In gVisor, a "Platform" is the specific mechanism used to intercept syscalls. Check out the gVisor - Platforms page for more details. This page focuses on systrap.

How Systrap Works (The Technical Flow)

Systrap relies on two Linux kernel features: Seccomp and Shared Memory.

Step A: The Trap (Seccomp)

When a container starts, gVisor sets up a seccomp filter on the application threads. This filter is configured with a special action: SECCOMP_RET_TRAP.

  • When the Application tries to make a syscall (e.g., write()), the Linux Kernel hits this seccomp rule.
  • Instead of executing the syscall, the Kernel stops the application thread and sends a SIGSYS signal.

Souce code: pkg/sentry/platform/systrap/filters.go

Step B: The Stub

Every application thread in Systrap has a "Stub" (a tiny piece of code) that handles this signal.

  • The stub catches the SIGSYS.
  • The stub then moves the thread into a "wait" state.

Step C: The Sentry Takes Over

The Sentry (the gVisor kernel) is constantly monitoring these stubs.

  • The Sentry sees that a thread is trapped.
  • It reads the syscall arguments (like the file descriptor and the buffer address) directly from the application's registers or memory.
  • The Sentry executes its own logic (e.g., "Am I allowed to write to this? Is this a network socket?").

Step D: The Return

Once the Sentry finishes the work, it places the result (success or error) back into the application's registers and tells the host kernel to resume the thread.

Why is Systrap "Fast"?

The old way (PTrace) was slow because the host kernel had to constantly context-switch between the App, the Kernel, and the Sentry. Systrap optimizes this using Shared Memory.

  1. Syscall Queue: Systrap creates a shared memory region between the Sentry and the Application stubs.
  2. Spinning: Instead of the Sentry "sleeping" and waiting for a slow interrupt, the Sentry can "spin" (poll) on a memory address.
  3. No Context Switch: When the App hits a syscall, it updates a value in shared memory. The Sentry sees this update instantly. This avoids the heavy overhead of the host kernel having to wake up the Sentry process manually.

Systrap vs. KVM

Feature Systrap KVM
Requirement Just standard Linux. Needs /dev/kvm (Hardware VT-x).
Speed Very fast (uses shared memory). Fastest (uses hardware transitions).
Compatibility Works everywhere (even inside VMs). Often blocked in "Nested" environments.
Primary Tool seccomp + SIGSYS. CPU Virtualization extensions.

KVM provides vCPU to run the application, what's the equivalent in Systrap?

When using Systrap (the modern default for gVisor), you don't have hardware-virtualized vCPUs. Instead, gVisor creates a software-defined vCPU model using standard Linux process features: Seccomp filters, Unix Signals, and Shared Memory.

If KVM is "Hardware-assisted isolation," Systrap is "Signal-assisted isolation."

1. The Equivalent of the "vCPU": The Stub

In KVM, the "vCPU" is a hardware state. In Systrap, gVisor injects a tiny, highly optimized piece of assembly code called the Stub into the application's memory space.

  • Every thread in your application has a corresponding "Executor" thread managed by the Sentry.
  • The application code runs as a normal host process, but it is "trapped" inside this Stub logic.

2. The Equivalent of the "Trap" (LSTAR): Seccomp

In KVM, the LSTAR register tells the CPU to jump to the Sentry on a syscall. In Systrap, gVisor uses Seccomp-BPF.

  • gVisor applies a strict Seccomp filter to the application process.
  • The filter says: "If this process tries to make ANY system call, do not execute it. Instead, trigger a SIGSYS signal."
  • SIGSYS is the "Software VM-EXIT." It stops the application thread dead in its tracks.

3. The Equivalent of the "VM-EXIT": Signal Handling

When the application hits a SYSCALL:

  1. The Trigger: The Linux kernel sees the Seccomp violation and sends a SIGSYS to the thread.
  2. The Jump: Because gVisor has set up a signal handler, the CPU jumps to the Stub code (which is also running in the app's process).
  3. The Handover: The Stub saves all the CPU registers (RAX, RBX, etc.) into a Shared Memory region that it shares with the Sentry.

4. The Equivalent of "KVM Shared State": Shared Memory

In KVM, the Sentry reads registers from a kernel structure. In Systrap, the Sentry and the Application share a memory segment (usually created via memfd).

  • The Stub writes: "I am trying to do a read() syscall. Here are my registers. I am now going to sleep."
  • The Stub then uses a fast synchronization primitive (like a futex or a specialized memory spinlock) to "wake up" the Sentry.

5. The Equivalent of "Sentry Processing": Sentry Executor

The Sentry has a pool of host threads waiting for these signals.

  • One Sentry thread wakes up, sees the data in the Shared Memory, and performs the syscall emulation (talking to the Gofer, etc.).
  • Once done, it writes the result back into the Shared Memory and signals the Stub to wake up.

Comparison: KVM vs. Systrap

Component KVM Platform Systrap Platform
Guest Context Virtual Machine (Hardware) Process Stub (Software)
Syscall Trigger SYSCALL instruction SYSCALL \rightarrow Seccomp
Interception VM-EXIT (Hardware) SIGSYS Signal (Software)
Data Exchange KVM Register State Shared Memory Region
Hardware Req. Intel VT-x / AMD-V None (Works on any Linux)
Performance Very High High (Faster than ptrace)

Why is Systrap the default now?

  1. Universality: It works on Cloud VMs (like AWS EC2 or Google Cloud) that don't always support "Nested Virtualization" (running a VM inside a VM).
  2. Startup Speed: Setting up a KVM instance has a small but measurable overhead. Systrap starts as fast as a regular process.
  3. Complexity: It doesn't require /dev/kvm permissions, making it easier to deploy in restricted environments.

Systrap is the "engine" that allows gVisor to be a sandbox. It uses Linux seccomp to "trap" application attempts to talk to the host kernel and redirects those attempts into the Sentry using high-speed shared memory communication.

It is the reason gVisor can provide high security with much better performance than traditional process-tracing tools.