Linux - ftrace

ftrace (short for "Function Tracer") is the official tracing framework built into the Linux kernel. Often called the "Swiss Army Knife" of kernel debugging, it allows developers and system administrators to see exactly what is happening inside the Linux kernel in real-time.

What does it actually do?

While the name implies it only traces functions, ftrace is a framework that hosts several different types of tracers:

Function Tracing: Tracks every kernel function call as it happens.
Function Graph Tracing: Shows the entry and exit of functions, providing a visual flow (nested calls) and timing data.
Context Switch Tracing: Tracks when processes are scheduled on and off the CPU.
Interrupt Tracing: Measures how long interrupts are disabled (critical for debugging system "stutter" or latency).
Event Tracing: Monitors specific kernel events like disk I/O, network packets, or system calls.

How it Works (The Magic of NOPs)

One of the most impressive things about ftrace is its low overhead.

When the kernel is compiled with ftrace support, the compiler inserts a call to a profiling routine (like mcount or fentry) at the start of every function.
At boot time, the kernel replaces these calls with NOPs (No-Operations). This means that, when ftrace is off, there is almost zero performance penalty.
When you enable tracing for a specific function, ftrace dynamically replaces those NOPs with a jump to the tracing code.

How to use it

ftrace does not require special software; it uses a virtual filesystem. You interact with it by reading and writing to files in /sys/kernel/debug/tracing.

Basic Workflow:

Navigate to the directory:
```
cd /sys/kernel/debug/tracing
```

Check available tracers:

cat available_tracers
# Output: function, function_graph, nop, etc.

Enable a tracer (e.g., function_graph):
```
echo function_graph > current_tracer
```
Filter for a specific function (optional but recommended): Instead of tracing everything (which produces too much data), trace only one function:
```
echo vfs_read > set_ftrace_filter
```
View the output:
```
cat trace | head -n 20
```

Example Output (Function Graph)

The function_graph tracer is popular because it’s easy to read. It looks like C code:

 0)               |  vfs_read() {
 0)               |    rw_verify_area() {
 0)   0.125 us    |      security_file_permission();
 0)   0.501 us    |    }
 0)               |    __vfs_read() {
 ...

This tells you exactly which sub-functions were called and how many microseconds they took to execute.

Why use ftrace instead of other tools?

No external dependencies: Unlike eBPF (which needs a compiler/library) or SystemTap, ftrace is already there on almost any Linux system. If you have a shell, you have ftrace.
Safety: It is built by kernel maintainers to be safe for production use.
Latency Analysis: It is the best tool for finding "Long-tail latency"—those rare moments where the system hangs for a few milliseconds for no apparent reason.

Frontends (The easy way)

Directly manipulating files in /sys/kernel/debug/tracing can be tedious. Most people use frontends:

trace-cmd: A command-line tool that makes it easy to record and report ftrace data (trace-cmd record -p function ...).
KernelShark: A GUI tool that turns trace-cmd data into visual graphs and timelines.
Perf: While a separate tool, perf can hook into ftrace infrastructure.

Summary

If you are a kernel developer, a driver writer, or a SRE (Site Reliability Engineer) trying to figure out why a specific system call is slow, ftrace is the tool you use to "look under the hood" while the engine is running.