Linux - Tracepoints
In the Linux kernel, Tracepoints are static "hooks" or "markers" manually inserted into the kernel source code by developers. They allow you to "spy" on the kernel’s internal behavior at specific, critical locations with very little performance impact.
Think of a Tracepoint as a built-in sensor in a car’s engine. The sensor is always there, but it doesn't do anything (and doesn't waste energy) until you plug in a diagnostic tool to read its data.
How They Work
Tracepoints are implemented using a mechanism called "Jump Labels."
- When Disabled (Default): A Tracepoint is essentially a
NOP(No-Operation) instruction or a branch that is never taken. The CPU skips right over it. It has near-zero overhead. - When Enabled: The
NOPinstruction is dynamically patched in memory with a jump to a "probe" function. This probe function collects the data and sends it to a tracing tool (likeftrace,perf, oreBPF).
The Anatomy of a Tracepoint
Developers define tracepoints using the TRACE_EVENT macro. This macro defines:
- Arguments: What data is being captured (e.g., a process ID, a filename, or a network packet length).
- Storage: How that data is formatted into a binary buffer to save space.
- Output: How that data should be printed as text for a human to read.
Example: A common tracepoint is sched_switch, which triggers every time the CPU switches from one task to another. It records the PID and name of the "prev" and "next" tasks.
Key Advantages
- Stability (The "API" Guarantee): This is the biggest advantage. Because tracepoints are explicitly named and defined in the code, kernel developers try not to change them. If you write a script to monitor
sched_switch, it will likely work on Linux 5.4, 6.1, and 10.0. - Performance: Because they are compiled into the kernel and use jump labels, they are much faster than dynamic probes (Kprobes).
- Structure: They provide clean, structured data. You don't have to guess which CPU register holds a variable; the tracepoint gives you the variable name directly.
How to find and use them
You can explore tracepoints directly through the Linux filesystem.
List all available tracepoints:
Tracepoints are organized by "subsystems" (like sched, net, block, syscalls).
ls /sys/kernel/debug/tracing/events
Why "events" instead of "tracepoints"? The relationship between an event and a tracepoint is usually 1
. When you see an event name like sched_switch in your tracing tools, it is tied to exactly one tracepoint hook in the kernel code.View the "format" of a specific tracepoint:
To see what data a tracepoint captures (e.g., openat syscall):
cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_openat/format
This will show you the fields like unsigned int dfd, const char * filename, etc.
Using them with bpftrace:
Tracepoints are a favorite tool for BPF users. To see every time a file is opened system-wide:
bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s opened %s\n", comm, str(args->filename)); }'
Summary
| Feature | Description |
|---|---|
| Origin | Hardcoded by kernel developers in C code. |
| Overhead | None when off; very low when on. |
| Flexibility | Limited (you can only trace what's already there). |
| Reliability | High (stable names and data formats). |
| Analogy | A factory-installed dashboard gauge. |
In short: Tracepoints are the "official" way to observe the kernel. If a tracepoint exists for what you want to do, always use it instead of a Kprobe.