gVisor - vDSO
In gVisor, the use of vDSO is a critical performance optimization. Because gVisor is a "user-space kernel" (the Sentry), the cost of a "system call" is even higher than in standard Linux.
If gVisor didn't use vDSO, every time a containerized application asked for the current time, it would have to trigger a context switch into the Sentry, which would then potentially have to ask the host kernel for the time—a double performance hit.
What is vDSO
Read more: Linux - vDSO
The Custom vDSO Image
gVisor does not simply pass the host's vDSO through to the application. Instead, gVisor provides its own custom-built vDSO ELF binary.
When the Sentry starts a new process, it maps this custom vDSO into the process's memory space. This allows gVisor to control exactly what code is executed when the application calls functions like gettimeofday() or clock_gettime().
The Shared "Params" Page
To make the vDSO work without switching to the Sentry, there needs to be shared data.
- The Sentry (Kernel): Periodically updates a specific page of memory with the current time (the "vDSO parameters page").
- The Application: The custom gVisor vDSO code reads from this memory page.
- The Mapping: gVisor maps this data page into the application's address space as read-only.
By doing this, the application can calculate the precise time using only user-space instructions, just like it would on a native Linux kernel.
Handling Time (The Primary Use Case)
Timekeeping is the most common reason gVisor uses vDSO.
- Monotonic and Realtime Clocks: gVisor’s Sentry calculates the offsets between the host clock and the container's clock. It writes these offsets and the "multiplier" values into the shared params page.
- The Logic: The vDSO code provided by gVisor contains the logic to read the CPU's hardware counter (like
RDTSCon x86) and apply the offsets from the shared page to return the correct "virtualized" time to the application.
Why not use the Host's vDSO?
You might wonder why gVisor doesn't just let the app use the host's vDSO. There are two main reasons:
- Isolation (Security): The host vDSO might leak information about the host system that gVisor wants to hide.
- Clock Virtualization: gVisor allows for "time traveling" or frozen clocks (common in testing or checkpoint/restore). If the app used the host vDSO, it would always see the host's real time, breaking the abstraction.
Interaction with "Platforms"
gVisor's vDSO behavior changes slightly depending on which platform it is running:
- KVM Platform: gVisor can take advantage of hardware features. The vDSO can often use the same high-speed timing mechanisms as the host.
- Ptrace Platform: This is much slower. Without vDSO, every
gettimeofdaywould trigger aptracetrap, which is incredibly expensive. In this mode, gVisor's vDSO is essential for keeping the application's performance acceptable.
eh_frame
In gVisor, when the Sentry creates the custom vDSO for the application, it must manually ensure that the ELF it generates contains a valid .eh_frame_hdr.
Because gVisor is written in Go, and Go has its own way of handling stacks, the developers have to be very careful that the assembly code in the gVisor vDSO is properly described so that standard Linux tools (like gdb or ptrace) can still unwind the stack through the gVisor boundary.
How gVisor uses .lds files
In the gVisor source code, you will find .lds files (often with a .lds.S extension if they need to be processed by a preprocessor) in the parts of the code that generate the Sentry and the vDSO.
- For the Sentry: The linker script ensures the Sentry is loaded into a specific memory range that doesn't conflict with the application it is trying to sandbox.
- For the vDSO: gVisor provides a custom linker script to build the vDSO ELF binary so that it matches exactly what a Linux application expects to see, including the correct version tags (like
LINUX_2.6).
Summary of the Flow
- Sentry maps a Custom vDSO Library and a Data Page into the App's memory.
- Sentry updates the Data Page with current time/clock information.
- App calls
clock_gettime(), which points to the Custom vDSO. - vDSO Code reads the Data Page and the CPU TSC, calculates the time, and returns.
- Result: The "System Call" never actually leaves user-space, and the Sentry is never even notified, making the call nearly as fast as native Linux.