logo

gVisor - nvproxy

nvproxy is a specialized subsystem within gVisor that allows sandboxed applications to access NVIDIA GPUs while maintaining the security boundaries of the sandbox.

In a standard container (like Docker), giving a container GPU access usually requires passing the GPU devices (/dev/nvidiaX) directly into the container. This creates a massive security hole because the NVIDIA kernel driver has a very large and complex attack surface. nvproxy solves this by acting as a "security filter" for the GPU.

TL;DR: nvproxy is used to virtualize GPU access

  • Virtualize CPU access: app => syscalls => sentry => limited syscalls => host kernel => CPU
  • Virtualize GPU access: app => gpu ioctls => sentry => nvproxy => limited GPU ioctls => GPU kernel module => GPU

The Problem: The GPU Attack Surface

The NVIDIA driver is a massive piece of software running in the host kernel. Applications interact with it primarily through a single system call: ioctl.

  • NVIDIA's ioctl interface has hundreds of different commands and complex data structures.
  • In a traditional setup, a compromised container could send a "malformed" ioctl to the NVIDIA driver to trigger a bug, crash the host, or escape the container.

The Solution: How nvproxy Works

Instead of letting the application talk directly to the host’s GPU driver, gVisor intercepts all GPU-related communication.

  1. Interception: When the application inside the sandbox tries to open /dev/nvidia0 or calls an ioctl on it, the gVisor Sentry (the "kernel" of the sandbox) intercepts that call.
  2. Validation (The Proxy): The nvproxy module inside the Sentry parses the ioctl. It checks that the arguments are valid, that the application isn't trying to access memory it shouldn't, and that the command itself is "safe."
  3. Translation: If the call is valid, nvproxy makes the corresponding ioctl call to the real host NVIDIA driver on behalf of the application.
  4. Memory Mapping: For performance, GPU applications need to map buffers into memory (mmap). nvproxy manages these mappings to ensure the application only sees its own GPU memory and cannot touch the host's memory.

Version Matching (The Hard Part)

NVIDIA frequently updates its drivers, and the "language" (UAPI) spoken between the user-space library (CUDA) and the kernel driver changes.

For nvproxy to work, it must "speak" the exact same version of the NVIDIA protocol as the driver installed on the host. Because of this:

  • nvproxy contains multiple implementations of the NVIDIA UAPI.
  • When gVisor starts, it detects the version of the NVIDIA driver installed on the host.
  • It then enables the specific version of nvproxy that matches that driver.

Why use nvproxy?

  • Security (AI/ML Workloads): Most modern AI/ML workloads (PyTorch, TensorFlow) require GPUs. These workloads often run untrusted code (e.g., a multi-tenant cloud or processing user-uploaded models). nvproxy allows you to run these in a hardened sandbox.
  • Isolation: It ensures that even if a CUDA application crashes or tries to exploit the driver, it cannot harm the host machine or other containers.
  • Standardization: It allows gVisor to support standard NVIDIA libraries (like libcuda.so) without modification.

Performance

Because nvproxy has to intercept and "sanitize" every ioctl, there is a small amount of overhead. However, since the most performance-critical part of GPU work (the actual computation) happens on the GPU hardware itself and uses direct memory mappings for data movement, the performance impact for heavy AI training or inference is usually negligible (often <1-2%).

What is "nv" in "nvproxy"?

"nv" stands for NVIDIA. It does NOT support AMD GPUs.

nvproxy is built exclusively for NVIDIA hardware. However, gVisor provides a separate, similar feature called tpuproxy for Google TPUs.

Is nvproxy in a separate repo?

No, it is inside the gVisor repo, under pkg/sentry/devices/nvproxy.

Additionally:

  • Definitions for structures and constants from the NVIDIA kernel driver are housed in pkg/abi/nvidia.
  • Logic for the runsc nvproxy subcommands (like list-supported-drivers) is part of the runsc/ package.

Summary

nvproxy is gVisor's "firewall" for NVIDIA GPUs. It lets you run CUDA and AI workloads inside a secure sandbox by intercepting, validating, and forwarding GPU commands to the host kernel, preventing the container from directly attacking the host's GPU driver.