eBPF
What is a BPF Object?
In the context of Linux and eBPF (Extended Berkeley Packet Filter), a BPF Object refers to the compiled binary artifact (usually an ELF file) that contains your BPF code and data structures.
If you are writing BPF tools (using libbpf), the "BPF Object" (struct bpf_object) is the main handle you use in your user-space code to manage everything you are about to load into the kernel.
Think of the BPF Object as a Container or a Package.
1. What is inside a BPF Object?
When you compile your C code (e.g., my_tool.bpf.c) using clang, it creates an .o file. This file is the BPF Object. It contains three main things:
- BPF Programs: The actual functions that will run in the kernel (e.g., "Trigger this code when
sys_openis called"). A single object can contain multiple programs. - BPF Maps: The shared memory structures (Hash tables, Arrays) used to store data or share it between the kernel and user space.
- Relocation Info & BTF: Metadata that tells
libbpfhow to adjust the code so it works on the specific version of Linux you are running (this is the "Compile Once, Run Everywhere" magic).
2. The Hierarchy
It helps to visualize the hierarchy libbpf uses:
- BPF Object (The File)
- Contains: BPF Map 1
- Contains: BPF Map 2
- Contains: BPF Program A (e.g., for
kprobe/sys_execve) - Contains: BPF Program B (e.g., for
tracepoint/syscalls/sys_enter_open)
3. How you use it (The Lifecycle)
In a typical BPF application (like one written in C or Go), the "Object" is the unit you manage during the setup phase:
- Open: You "open" the BPF Object.
libbpfreads the ELF file and parses the sections but doesn't touch the kernel yet. - Load: You "load" the BPF Object.
libbpfcreates the Maps in the kernel, verifies the bytecode, and loads the Programs into the kernel. - Attach: You "attach" the specific Programs inside the Object to their hooks (events).
Summary
- BPF Program: A single function (bytecode) that runs on an event.
- BPF Map: A storage area for data.
- BPF Object: The entire collection of programs and maps compiled together into a single file, which you load and manage as a group.
What is a Map?
In the context of BPF (eBPF), a Map is a shared data structure that allows the BPF program (running in the kernel) and your application (running in user space) to talk to each other.
Since BPF programs are highly restricted—they cannot access arbitrary memory and they exit immediately after their event finishes—they need a specific place to store data. That place is the Map.
Think of a BPF Map as a Shared Whiteboard or a Dropbox.
1. Why do we need Maps?
BPF programs are "stateless." If you have a BPF program attached to a network packet, it wakes up, inspects one packet, and then vanishes. It doesn't remember what happened to the previous packet.
Maps solve two specific problems:
- Statefulness: They allow the BPF program to remember data between events (e.g., "I have seen 5 packets from this IP address so far").
- Communication: They allow the User Space application to read what the Kernel is seeing, or to send configuration down to the Kernel.
2. How it works (The Architecture)
[ User Space App ] [ Kernel Space ]
(Your Python/C/Go Tool) (BPF Program)
| |
| R/W | R/W
v v
+--------------------------------------------------+
| BPF MAP |
| (Key -> Value Store) |
+--------------------------------------------------+
- The Kernel writes: A network packet arrives. The BPF program checks the Map: "Is this IP in the blocklist?" or it updates the Map: "Increment the counter for this IP."
- The User reads: Your tool running in the terminal reads the Map every second to display the current counters to you.
3. Common Types of Maps
While they are generally Key-Value stores, there are different "flavors" optimized for different jobs:
A. Hash Map (BPF_MAP_TYPE_HASH)
- Structure: Standard Key-Value pair.
- Use Case: You don't know the keys in advance.
- Example: Counting how many bytes each specific Process ID (PID) has written.
- Key: PID (1234)
- Value: Bytes (500)
B. Array (BPF_MAP_TYPE_ARRAY)
- Structure: A pre-allocated list (like a C array). Faster than a Hash Map but fixed size.
- Use Case: Global settings or a small set of known keys.
- Example: A simple on/off switch or global error counter.
- Key: 0 (Global Error Count)
- Value: 55
C. Ring Buffer (BPF_MAP_TYPE_RINGBUF)
- Structure: A circular queue (First-In-First-Out).
- Use Case: Sending "events" to user space efficiently.
- Example: Every time a file is opened, the BPF program pushes the filename into the Ring Buffer. The User Space app sits in a loop pulling filenames out and printing them.
D. LPM Trie (BPF_MAP_TYPE_LPM_TRIE)
- Structure: Longest Prefix Match.
- Use Case: Networking/Firewalls.
- Example: Matching IP subnets (like
192.168.1.0/24).
4. A Concrete Example
Imagine you want to count how often the function sys_open is called.
- Create Map: You declare an Array Map with 1 slot.
- BPF Program:
- Triggers when
sys_openstarts. - Reads the value at Index 0.
- Adds +1.
- Updates the value at Index 0.
- Triggers when
- User App:
- Every 1 second, it reads Index 0 from the Map.
- Prints: "Files opened so far: [Value]".
Why not just send a "Message"?
You might wonder why we don't just send a message like a standard API.
- Efficiency: Maps allow the kernel to keep working without waiting for the user-space app to "acknowledge" the data.
- Persistence: If your user-space app crashes and restarts, the data in the Map stays in the kernel. When the app comes back online, it just grabs the File Descriptor again and continues where it left off.
What are Pinned Maps?
A Pinned Map is a BPF map that has been given a permanent location in the file system (specifically, the BPF virtual file system) so that it stays alive even after the program that created it exits.
To understand why this is necessary, you have to understand the standard "lifecycle" of a map.
1. The Problem: The "Disappearing" Map
By default, BPF Maps are tied to the process (the tool) that creates them.
- You run your BPF tool (
./my_monitor). - The tool tells the kernel to create a Map.
- The kernel gives the tool a File Descriptor (FD) (a handle to hold onto that map).
- The Issue: If you close the tool, or if it crashes, the Operating System sees that the File Descriptor is closed. Since no one is holding onto the map anymore, the kernel deletes the map and all the data inside it is lost.
2. The Solution: Pinning
Pinning is the act of "exporting" that map to a specific path on the disk, usually under /sys/fs/bpf/.
By doing this, you are effectively telling the kernel: "Don't delete this map when my program closes. Keep it alive because it is 'pinned' to this filename."
3. How it Works
Instead of the map existing only in the nebulous "memory space" of your application, it becomes a visible file entry.
- Creation: Your app creates a map (e.g.,
my_packet_counter). - Pinning: Your app calls the
bpf_obj_pinfunction to link that map to/sys/fs/bpf/my_packet_counter. - Exit: Your app closes. The map stays in memory.
- Retrieval: Later, you restart your app (or run a completely different app). It calls
bpf_obj_geton that path. The kernel recognizes the path and gives your new app access to the existing data.
4. Key Use Cases
A. Persisting Data Across Restarts
Imagine a firewall tool that counts blocked packets.
- Without Pinning: If you need to restart the userspace tool to update a setting, you lose all your historical counts.
- With Pinning: You restart the tool, it reconnects to the pinned map, and your counters resume from where they left off (e.g., 500,001 instead of 0).
B. Sharing Data Between Different Programs
Pinned maps allow totally different applications to share data.
- App A (Written in C): Updates a map with network traffic stats. Pins it to
/sys/fs/bpf/traffic_stats. - App B (Written in Python): Opens
/sys/fs/bpf/traffic_statsto read that data and display a graph on a website. - App C (iproute2/tc): Reads the same map to make routing decisions.
5. Important Nuances
- It is not a "Real" File: Even though it looks like a file in
/sys/fs/bpf, you cannot open it with a text editor orcatit. It is a "pseudo-file" that acts as a handle for the BPF object. - Reboots: Pinned maps survive application crashes and restarts, but they do not survive a system reboot. The data is still stored in RAM.
- Unpinning: To delete a pinned map, you simply delete the file (e.g.,
rm /sys/fs/bpf/my_packet_counter). Once the file is gone and no programs are using it, the kernel frees the memory.
Summary Analogy
- Standard Map: A balloon you are holding by a string. If you walk away (exit the app), you drop the string, and the balloon floats away (is destroyed).
- Pinned Map: You tie the balloon's string to a fence post (
/sys/fs/bpf). You can walk away, come back later, or let your friend hold the string; the balloon stays right where it is.
Where is eBPF used?
- BPF-LSM: BPF based Linux Security Moduel.
- seccomp-bpf: This is a kernel feature that allows a process to restrict its own system calls. With the addition of eBPF, you can create very expressive and dynamic filters to define which syscalls an application is allowed to use and with what arguments. This is a key tool for container runtimes to provide a strong isolation boundary.
- eBPF-based observability tools: A huge part of security is being able to see what's happening on your system. Projects like Falco use eBPF to monitor system calls and other kernel events to detect anomalous behavior. These tools provide deep, in-kernel visibility with minimal performance overhead, which is a significant improvement over traditional auditing systems like auditd.
- Network security with eBPF: Tools like Cilium use eBPF to implement networking and security policies for containerized workloads. By operating directly in the kernel's networking data path, they can perform highly efficient packet filtering and enforce network policies with a deep understanding of application context.
What is BTF
BTF stands for BPF Type Format. It's a metadata format that provides crucial debugging information about BPF programs and maps. Think of it as a compact, streamlined version of the debug information (like DWARF) that's used to compile and run BPF programs more efficiently.