gVisor - systrap
In gVisor, Systrap is the name of the Platform used to intercept system calls.
To understand Systrap, you have to understand the core challenge of gVisor: The Application thinks it is talking to a Linux Kernel, but gVisor needs to "hijack" those system calls and send them to the Sentry (the user-space kernel) instead.
The Context: What is a "Platform"?
In gVisor, a "Platform" is the specific mechanism used to intercept syscalls. Check out the gVisor - Platforms page for more details. This page focuses on systrap.
How Systrap Works (The Technical Flow)
Systrap relies on two Linux kernel features: Seccomp and Shared Memory.
Step A: The Trap (Seccomp)
When a container starts, gVisor sets up a seccomp filter on the application threads. This filter is configured with a special action: SECCOMP_RET_TRAP.
- When the Application tries to make a syscall (e.g.,
write()), the Linux Kernel hits this seccomp rule. - Instead of executing the syscall, the Kernel stops the application thread and sends a SIGSYS signal.
Souce code: pkg/sentry/platform/systrap/filters.go
Step B: The Stub
Every application thread in Systrap has a "Stub" (a tiny piece of code) that handles this signal.
- The stub catches the
SIGSYS. - The stub then moves the thread into a "wait" state.
Step C: The Sentry Takes Over
The Sentry (the gVisor kernel) is constantly monitoring these stubs.
- The Sentry sees that a thread is trapped.
- It reads the syscall arguments (like the file descriptor and the buffer address) directly from the application's registers or memory.
- The Sentry executes its own logic (e.g., "Am I allowed to write to this? Is this a network socket?").
Step D: The Return
Once the Sentry finishes the work, it places the result (success or error) back into the application's registers and tells the host kernel to resume the thread.
Why is Systrap "Fast"?
The old way (PTrace) was slow because the host kernel had to constantly context-switch between the App, the Kernel, and the Sentry. Systrap optimizes this using Shared Memory.
- Syscall Queue: Systrap creates a shared memory region between the Sentry and the Application stubs.
- Spinning: Instead of the Sentry "sleeping" and waiting for a slow interrupt, the Sentry can "spin" (poll) on a memory address.
- No Context Switch: When the App hits a syscall, it updates a value in shared memory. The Sentry sees this update instantly. This avoids the heavy overhead of the host kernel having to wake up the Sentry process manually.
Systrap vs. KVM
| Feature | Systrap | KVM |
|---|---|---|
| Requirement | Just standard Linux. | Needs /dev/kvm (Hardware VT-x). |
| Speed | Very fast (uses shared memory). | Fastest (uses hardware transitions). |
| Compatibility | Works everywhere (even inside VMs). | Often blocked in "Nested" environments. |
| Primary Tool | seccomp + SIGSYS. |
CPU Virtualization extensions. |
KVM provides vCPU to run the application, what's the equivalent in Systrap?
When using Systrap (the modern default for gVisor), you don't have hardware-virtualized vCPUs. Instead, gVisor creates a software-defined vCPU model using standard Linux process features: Seccomp filters, Unix Signals, and Shared Memory.
If KVM is "Hardware-assisted isolation," Systrap is "Signal-assisted isolation."
1. The Equivalent of the "vCPU": The Stub
In KVM, the "vCPU" is a hardware state. In Systrap, gVisor injects a tiny, highly optimized piece of assembly code called the Stub into the application's memory space.
- Every thread in your application has a corresponding "Executor" thread managed by the Sentry.
- The application code runs as a normal host process, but it is "trapped" inside this Stub logic.
2. The Equivalent of the "Trap" (LSTAR): Seccomp
In KVM, the LSTAR register tells the CPU to jump to the Sentry on a syscall. In Systrap, gVisor uses Seccomp-BPF.
- gVisor applies a strict Seccomp filter to the application process.
- The filter says: "If this process tries to make ANY system call, do not execute it. Instead, trigger a SIGSYS signal."
- SIGSYS is the "Software VM-EXIT." It stops the application thread dead in its tracks.
3. The Equivalent of the "VM-EXIT": Signal Handling
When the application hits a SYSCALL:
- The Trigger: The Linux kernel sees the Seccomp violation and sends a SIGSYS to the thread.
- The Jump: Because gVisor has set up a signal handler, the CPU jumps to the Stub code (which is also running in the app's process).
- The Handover: The Stub saves all the CPU registers (RAX, RBX, etc.) into a Shared Memory region that it shares with the Sentry.
4. The Equivalent of "KVM Shared State": Shared Memory
In KVM, the Sentry reads registers from a kernel structure. In Systrap, the Sentry and the Application share a memory segment (usually created via memfd).
- The Stub writes: "I am trying to do a
read()syscall. Here are my registers. I am now going to sleep." - The Stub then uses a fast synchronization primitive (like a
futexor a specialized memory spinlock) to "wake up" the Sentry.
5. The Equivalent of "Sentry Processing": Sentry Executor
The Sentry has a pool of host threads waiting for these signals.
- One Sentry thread wakes up, sees the data in the Shared Memory, and performs the syscall emulation (talking to the Gofer, etc.).
- Once done, it writes the result back into the Shared Memory and signals the Stub to wake up.
Comparison: KVM vs. Systrap
| Component | KVM Platform | Systrap Platform |
|---|---|---|
| Guest Context | Virtual Machine (Hardware) | Process Stub (Software) |
| Syscall Trigger | SYSCALL instruction |
SYSCALL Seccomp |
| Interception | VM-EXIT (Hardware) | SIGSYS Signal (Software) |
| Data Exchange | KVM Register State | Shared Memory Region |
| Hardware Req. | Intel VT-x / AMD-V | None (Works on any Linux) |
| Performance | Very High | High (Faster than ptrace) |
Why is Systrap the default now?
- Universality: It works on Cloud VMs (like AWS EC2 or Google Cloud) that don't always support "Nested Virtualization" (running a VM inside a VM).
- Startup Speed: Setting up a KVM instance has a small but measurable overhead. Systrap starts as fast as a regular process.
- Complexity: It doesn't require
/dev/kvmpermissions, making it easier to deploy in restricted environments.
Systrap is the "engine" that allows gVisor to be a sandbox. It uses Linux seccomp to "trap" application attempts to talk to the host kernel and redirects those attempts into the Sentry using high-speed shared memory communication.
It is the reason gVisor can provide high security with much better performance than traditional process-tracing tools.