Linux - Symbols

In the context of Linux (and computer science in general), symbols are symbolic names assigned to specific memory addresses. They represent the "human-readable" versions of the addresses where functions, variables, or objects are located within a program.

Think of a symbol as a label on a physical mailbox. The CPU only understands the "GPS coordinates" (the hex memory address), but developers and tools use the "label" (the symbol name) to know what is inside that address.

TL;DR:

Symbol = [Memory Address] + [Type Code] + [Human Name] + [Optional: Module]

What do Symbols Represent?

Symbols generally represent three things in a compiled program:

Functions: For example, the name main or printf.
Global Variables: Variables declared outside of functions that are accessible across files.
Static Variables: Variables restricted to a specific file or scope.

The Symbol Table

When you compile code (e.g., C or C++), the compiler creates a Symbol Table. This is a data structure stored inside the object file (.o) or the final executable (the ELF file in Linux).

The table contains:

Symbol Name: e.g., calculate_total.
Value: Usually the memory offset or address.
Binding: Whether the symbol is Local (only visible in its own file) or Global (visible to the whole program).
Type: Is it a function (code) or a variable (data)?

Defined vs. Undefined Symbols

This is a critical distinction for understanding how programs run:

Defined Symbols: These are functions or variables created within that specific file. (e.g., You wrote the code for void myFunc()).
Undefined Symbols: These are references to things that exist somewhere else. (e.g., You called printf, but printf is defined in the Standard C Library, not your file).

The Linker (ld) is the tool that matches "Undefined" symbols in one file with "Defined" symbols in another to create a working program.

How to View Symbols in Linux

Linux provides several powerful command-line tools to inspect symbols inside binaries:

`nm` (The most common tool)

The nm command lists symbols in object files or executables. nm stands for "name list".

nm ./my_program

You will see output codes like:

T (Text): A defined function (code section).
U (Undefined): A symbol the program needs but doesn't have defined internally (it expects to find it in a library).
D (Data): An initialized global variable.
B (BSS): An uninitialized global variable.

`readelf`

Since Linux uses the ELF (Executable and Linkable Format), readelf provides a more detailed view.

readelf -s ./my_program

`objdump`

Used for looking at the "internals" of a binary, including symbols and the actual assembly code.

objdump -t ./my_program

Stripped vs. Unstripped Binaries

When you download a production program (like ls or grep), it is often stripped.

Unstripped: Contains the full symbol table. This is great for debugging (so you can see function names in a crash report).
Stripped: The symbol table has been removed to save space and make "reverse engineering" slightly harder.

If you try to run nm on a stripped binary, you will get: no symbols.

Dynamic Symbols

In modern Linux, most programs use Shared Libraries (.so files). When a program runs, it doesn't actually contain the code for printf. Instead, it has a Dynamic Symbol that tells the Linux "Dynamic Loader" to go find the printf symbol inside /lib/libc.so.6 at runtime.

Where are the symbols stored?

In Linux, symbols are stored in different locations depending on whether the code is sitting on your disk (as a file) or running in memory (as a process or the kernel).

Here are the four primary places where symbols are stored:

1. Inside ELF Files (Object Files, Executables, and Libraries)

The most common place for symbols is inside the ELF (Executable and Linkable Format) file itself. Within an ELF file, symbols are stored in specific "sections":

.symtab (Symbol Table): This contains all the symbols needed to locate and relocate a program's symbolic definitions and references. It is usually quite large and is often stripped (removed) from production binaries to save space.
.dynsym (Dynamic Symbol Table): This contains only the symbols needed for dynamic linking (e.g., calls to printf in libc.so). Unlike .symtab, this section cannot be stripped if the program is to run, because the operating system needs it to connect the program to shared libraries at runtime.
.strtab and .dynstr: Since the symbol tables above only contain numbers and offsets, these "String Tables" store the actual text names (like "main" or "my_variable").

How to see them:

readelf -s /bin/ls

2. In the Linux Kernel (`/proc/kallsyms`)

The kernel doesn't work like a standard user-space program. While the kernel binary exists on disk (usually in /boot/vmlinuz-...), the symbols currently loaded in the running kernel are managed in a special way:

The __ksymtab section: When the kernel is compiled, it creates a table of exported symbols.
/proc/kallsyms: This is a virtual file. It doesn't exist on your hard drive; the kernel generates it on the fly when you read it. It lists every symbol currently available in the kernel, including those from loaded Kernel Modules (.ko files).

3. Separate Debug Files (`/usr/lib/debug`)

To keep download sizes small, most Linux distributions (Ubuntu, Fedora, etc.) strip symbols from their main binaries. They provide "Debug Symbols" in separate packages (e.g., libc6-dbg or python3-dbgsym).

These symbols are stored in:

/usr/lib/debug/: This directory mimics the root file system. For example, the symbols for /bin/bash might be found at /usr/lib/debug/bin/bash.debug.
DWARF format: This is the standard debugging data format used inside these files, which maps machine code back to the original source code line numbers.

4. System Map (`/boot/System.map`)

When you compile a kernel, a file called System.map is created. This is a static text file that acts as a "phone book" for that specific kernel version.

Location: /boot/System.map-$(uname -r)
Purpose: It is used by developers and tools to look up kernel addresses when the kernel itself might not be running or is in a crashed state (Kernel Panic).

Why my first column of `cat /proc/kallsyms` is all 0?

The first column of /proc/kallsyms is the memory address where that specific symbol (function or variable) is located in the kernel's virtual address space.

If you are seeing a column of 16 zeros (e.g., 0000000000000000), you've stumbled upon one of the kernel's most important modern security features. It’s not that the addresses don't exist; it's that the kernel is hiding them from you.

This is controlled by a kernel setting called kptr_restrict.

The Reason: Security (KASLR)

If a hacker knows the exact memory address of a kernel function (like commit_creds), they can use that information to craft an exploit (such as a Buffer Overflow or ROP chain) to gain root access.

By hiding these addresses from unprivileged users, the kernel makes it much harder to exploit vulnerabilities, a defense strategy related to KASLR (Kernel Address Space Layout Randomization).

How to see the real addresses

To see the actual addresses, you usually need root privileges. Try running the command with sudo:

sudo cat /proc/kallsyms | head

If it still shows zeros even with sudo, it means your system has a higher level of restriction enabled.

Understanding `kptr_restrict`

The behavior of these addresses is governed by the file /proc/sys/kernel/kptr_restrict. You can check its value by running:

cat /proc/sys/kernel/kptr_restrict

The value will be 0, 1, or 2:

0 (Disabled): Kernel pointers are printed to everyone. (Very rare in modern distributions).
1 (Default on most systems): Pointers are hidden (zeroed out) for normal users. Only users with the CAP_SYSLOG capability (usually root/sudo) can see them.
2 (Always hidden): Pointers are hidden from everyone, including the root user. This is common on highly secured servers or mobile OSs like Android.

How to change it (Temporary)

If you are debugging or doing kernel development and you need to see those addresses as root, you can change the restriction level:

To allow root to see addresses (Set to 1):

echo 1 | sudo tee /proc/sys/kernel/kptr_restrict

To allow everyone to see addresses (Set to 0 - Not recommended):

echo 0 | sudo tee /proc/sys/kernel/kptr_restrict

Note: This change will revert back to the default after a reboot unless you modify /etc/sysctl.conf.