logo

Linux - ELF

In the Linux world, ELF stands for Executable and Linkable Format. It is the standard file format for executables, object code, shared libraries, and core dumps.

Before ELF, Linux used older formats like a.out or COFF, but ELF became the standard because it is flexible, extensible, and supports cross-platform development.

If I compile a C++ program, does it produce an ELF file?

Yes, if you are on Linux.

When you compile a C++ program using a compiler like g++ or clang++ on a Linux system, the output is an ELF file.

The Compilation Journey

When you run a command like g++ main.cpp -o my_app, the compiler actually creates two different types of ELF files during the process:

  1. Relocatable ELF (.o file): First, the compiler turns your C++ code into "Object Files." If you run g++ -c main.cpp, it produces main.o. This is an ELF file, but it cannot "run" yet because it doesn't know where library functions (like std::cout) are located.
  2. Executable ELF (The final app): The "Linker" takes your .o files and combines them with system libraries to create the final executable. This is the ELF file you actually run.

How to Prove It

You can verify this yourself using the file command on any compiled C++ binary:

# 1. Compile a simple C++ file
g++ main.cpp -o my_app

# 2. Check the file type
file my_app

The output will look something like this:

my_app: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, ...

Notice the first word in the description is ELF.

Does it produce ELF on other systems?

No. The ELF format is specific to Linux and some Unix-like systems (like FreeBSD).

If you compile the exact same C++ code on different Operating Systems, you get different formats:

  • Windows: Produces a PE (Portable Executable) file (usually ending in .exe).
  • macOS: Produces a Mach-O file.
  • Linux: Produces an ELF file.

What’s inside a C++ ELF vs. a C ELF?

Structurally, a C++ ELF file is exactly the same as a C ELF file. However, if you look at the Symbol Table (.symtab), you will notice a big difference: Name Mangling.

  • In C, a function name in the ELF symbol table looks like the code: my_function.
  • In C++, because of features like function overloading and namespaces, the names are "mangled" to encode extra information. A function like print(int) might appear in the ELF symbol table as _Z5printi.

What about other languages like Rust or Go?

Regardless of the language, if you are compiling for Linux, the final output will almost always be an ELF file.

However, Go and Rust handle the "insides" of the ELF file differently than C++ does.

Rust (The "C++ Style" ELF)

Rust is designed to be highly compatible with the C/C++ ecosystem. It uses the LLVM compiler backend, which is the same one used by Clang.

  • Format: A standard ELF file.
  • Dependencies: By default, Rust links dynamically to the system's C library (libc). If you run ldd on a Rust binary, you will see it pointing to /lib/x86_64-linux-gnu/libc.so.6.
  • Sections: It uses the standard .text, .data, and .rodata sections.
  • Symbols: Like C++, Rust uses Name Mangling. Because Rust has namespaces and generics, a function like hello::print won't appear as "print" in the symbol table; it will look like _ZN5hello5print17h6....
  • Safety Metadata: Rust often includes extra sections for stack unwinding (to handle panic! calls safely).

Go (The "Unique" ELF)

Go is the "rebel" in the ELF world. Because Go was designed to be easily portable and fast to deploy, it does things differently.

  • Static by Default: Historically, Go preferred to be statically linked. This means a Go ELF file often contains every single library it needs inside the file itself. It usually does not depend on the system's libc.
  • Binary Size: Because it includes its own "Runtime" (for garbage collection and goroutine management), a Go ELF file is much larger than a C++ or Rust file. A "Hello World" in Go might be 2MB, while in C it is 16KB.
  • Custom Sections: Go includes specialized sections that other languages don't use:
    • .gopclntab: A large table containing function names and line numbers. This is why Go can give you such detailed "stack traces" when a program crashes.
    • .go.buildinfo: Contains information about the Go version used to build it and the modules included.
  • Symbol Table: Go names symbols in a very readable way in the symbol table, like main.main or runtime.mallocgc.

The Structure of an ELF File

An ELF file is divided into four main parts:

  1. ELF Header: The "table of contents." It describes whether the file is 32-bit or 64-bit, the CPU architecture (like x86_64 or ARM), and the entry point (where the program starts).
  2. Program Header Table: Tells the system how to create a process image. It is essential for executing the file.
  3. Section Header Table: Contains descriptions of the various sections of the file. It is essential for linking and debugging.
  4. Data: This is where the actual code and data reside.

Common Sections in ELF

ELF files are organized into "sections" that separate different types of information. Here are the most common ones you will encounter:

  • .text: Contains the actual executable machine code. This section is usually read-only to prevent the program from modifying its own instructions.
  • .data: Contains initialized global and static variables. For example, if you write int x = 10; outside a function, it lives here.
  • .bss: Block Started by Symbol. Contains uninitialized global and static variables. It doesn't take up space on the disk; it just tells the OS to allocate memory and fill it with zeros when the program starts.
  • .rodata: (Read-Only Data) Contains constants, such as hardcoded strings (e.g., "Hello, World!").
  • .symtab: The Symbol Table. It stores information to locate and relocate a program's symbolic definitions and references (function names, variable names).
  • .strtab: The String Table. Most of the names used in the symbol table are stored here as raw strings.
  • .plt and .got: These handle Dynamic Linking. They allow the program to call functions in external libraries (like printf in libc.so).
  • .interp: Specifies the path to the dynamic linker (e.g., /lib64/ld-linux-x86-64.so.2) that must be run to start the program.

File Suffixes (Extensions)

Unlike Windows, Linux does not strictly rely on file extensions to know if a file is executable (it uses "Permissions" instead). However, by convention, ELF files usually use these suffixes:

  1. No Extension: Most standard command-line tools and compiled binaries have no extension (e.g., /bin/ls, /bin/bash, or your own compiled my_program).
  2. .o (Object Files): These are "intermediate" ELF files created by a compiler but not yet linked into a final program.
  3. .so (Shared Objects): These are Linux "Shared Libraries" (the equivalent of a Windows .dll). They are loaded at runtime.
  4. .ko (Kernel Objects): These are ELF files used as Linux Kernel Modules (drivers).
  5. .elf: Sometimes used in embedded systems development to explicitly identify the format.
  6. .cgi: Often used for web server scripts that are compiled ELF binaries.

How to tell a file is an ELF file?

The easiest method is the file command, which analyzes the file header to determine its type.

  • Command: file <filename>
  • Result: If it is an ELF file, the output will explicitly start with "ELF," followed by architecture details (e.g., 64-bit), linking information, and target OS.

How to Inspect an ELF File

If you want to see the "insides" of an ELF file on your Linux system, you can use these built-in tools:

  • file <filename>: As mentioned above. Confirms if the file is an ELF file and what architecture it's for.
  • readelf -h <filename>: Shows the ELF Header.
  • readelf -S <filename>: Lists all the Sections.
  • nm <filename>: Lists the symbols (functions and variables) inside the file.
  • objdump -d <filename>: Disassembles the .text section back into assembly code.

How is a symbol different from a string?

The String

In an ELF file, strings are just sequences of characters ending in a null byte (\0). They have no "intelligence" or metadata attached to them.

  • Where they live: Mostly in the .strtab (String Table) or .dynstr (Dynamic String Table) sections.
  • What they are: A giant "blob" of text. For example: printf\0main\0my_variable\0.
  • Purpose: To save space. Instead of repeating the name "main" everywhere, the ELF file stores "main" once in the string table, and everything else just points to the "index" where that word starts.

The Symbol

A Symbol is a fixed-size structure that describes a programming entity (like a function or a global variable). It is much more than just a name; it is a "file card" that tells the computer what that entity is and where it lives in memory.

  • Where they live: In the .symtab (Symbol Table) or .dynsym (Dynamic Symbol Table).
  • What they contain: A symbol structure (e.g., Elf64_Sym) contains several fields:
    1. Name (Index): A pointer (offset) into the String Table. (This is the link between the two!)
    2. Value: The memory address of the symbol.
    3. Size: How many bytes the symbol occupies (e.g., the size of a variable).
    4. Type: Is it a Function? An Object (variable)? A Section?
    5. Binding: Is it Local (only visible in this file) or Global (visible to the whole program)?

Comparison table

Feature String (in .strtab) Symbol (in .symtab)
Content Raw text (e.g., "my_func") A data structure (metadata)
Size Variable length Fixed size (e.g., 24 bytes on 64-bit)
Information Only the name Address, Size, Type, Scope, and Name
Purpose Human-readable identification Helping the Linker/OS find and connect code
Analogy The text printed on a book's spine The entry for that book in the library computer

How they work together

Imagine you have a function in your C code called calculate_total.

  1. The String Table (.strtab) will contain the literal text: ...calculate_total\0...
  2. The Symbol Table (.symtab) will have an entry that looks like this:
    • Name index: 450 (Points to where "calculate_total" starts in the string table).
    • Type: STT_FUNC (It's a function).
    • Value: 0x4010a0 (The memory address where the function code starts).
    • Size: 120 bytes (The length of the function's machine code).
    • Binding: STB_GLOBAL (Other files are allowed to call this function).

Why separate them?

Efficiency. If the ELF format put the actual text name inside the Symbol Table, every symbol entry would be a different size. This would make it very slow for the operating system to "search" the table.

By keeping the Symbols a fixed size, the OS can jump directly to the 100th symbol by doing a simple math calculation (100 * size_of_symbol). It only goes to the String Table when it actually needs to display or match the name.

How to see this in action

You can see the symbols and their name offsets using readelf:

readelf -s  my_program

You will see a column for "Value," "Size," "Type," and "Name." The "Name" shown is actually the tool looking up the index in the string table for you.

.dynsym vs .symtab

.dynsym stands for the Dynamic Symbol Table. To understand .dynsym, you have to compare it to the regular .symtab (Symbol Table). While both hold symbols, they serve very different purposes in the life of a program.

The Core Difference: Performance vs. Debugging

  • .symtab (The Big Table): Contains every symbol needed to link and debug the program. This includes local variables, source file names, and internal functions. It is used by the Linker (ld) during compilation.
  • .dynsym (The Slim Table): Contains only the symbols needed for Dynamic Linking at runtime. This includes functions imported from shared libraries (like printf from libc.so) or symbols exported by a library for others to use.

Why do we need a separate table?

When you run a program, the Dynamic Linker (the part of the OS that loads the program) needs to quickly find where printf or malloc is located in memory.

If the OS had to search through the massive .symtab (which could contain thousands of unnecessary local symbols used only for debugging), starting a program would be very slow.

.dynsym is an optimized, "VIP-only" version of the symbol table. It only contains what is strictly necessary to get the program running.

The "Stripping" Factor

This is the most practical difference for Linux users:

  • .symtab can be removed: You can run the command strip --strip-all my_program. This deletes the .symtab to make the file size smaller. The program will still run perfectly fine because the OS doesn't use .symtab to execute code.
  • .dynsym cannot be removed: If you delete the .dynsym section, the program will crash or fail to start. The OS will no longer know how to connect your code to the external libraries it needs.

Comparison

Feature .symtab (Symbol Table) .dynsym (Dynamic Symbol Table)
Purpose Static linking and debugging. Dynamic linking at runtime.
Scope Global AND Local symbols. Only Global/External symbols.
Size Large (contains everything). Small (minimalist).
Loaded to RAM? No (usually stays on disk). Yes (loaded into memory at runtime).
Can be stripped? Yes (often done for production). No (required for execution).

How to see it in action

You can see the difference using the nm command.

  1. To see the regular Symbol Table:

    nm my_program
    

    (If the file is "stripped," this will return an error or be empty.)

  2. To see the Dynamic Symbol Table:

    nm -D my_program
    

    (This will always show symbols, even if the file is stripped, as long as it uses shared libraries.)

Alternatively, using readelf:

readelf -s  my_program # Shows both, but labeled separately

What is .rodata?

.rodata stands for Read-Only Data.

What goes into .rodata?

The .rodata section stores constants that are defined at compile time and should never be changed while the program is running. Common examples include:

  • String Literals: For example, printf("Hello, World!\n");. The string "Hello, World!\n" is stored in .rodata.
  • Global/Static Constants: Variables declared with the const keyword in C/C++ (e.g., const int MAX_USERS = 100;).
  • Jump Tables: Sometimes used by compilers to optimize switch statements.

How it works in Memory

When you run a program, the Linux loader (part of the kernel and ld-linux.so) maps the ELF file into RAM.

  1. Memory Protection: The operating system’s Memory Management Unit (MMU) marks the memory pages containing the .rodata section as Read-Only (R).
  2. Security: If the program attempts to write to a memory address located in the .rodata section, the CPU triggers a hardware exception, and the OS kills the process with a Segmentation Fault (SIGSEGV). This prevents accidental bugs or malicious exploits from modifying constant data.
  3. Efficiency: Since the data is read-only, multiple instances of the same program can share the same physical memory pages for .rodata, saving RAM.

Comparison with other sections

To understand .rodata, it helps to see it alongside its "neighbors" in an ELF file:

Section Content Permissions
.text Machine code (instructions) Read + Execute
.rodata Constants, String literals Read Only
.data Initialized global/static variables (e.g., int x = 10;) Read + Write
.bss Uninitialized global/static variables (e.g., int y;) Read + Write

Code Example

Consider this C code:

#include <stdio.h>

const int age = 30;         // Stored in .rodata
char *name = "John Doe";    // The pointer 'name' is in .data,
                            // but the string "John Doe" is in .rodata

int main() {
    printf("Name: %s, Age: %d\n", name, age);
    // age = 31;            // This would cause a COMPILE error
    // name[0] = 'R';       // This would cause a SEGMENTATION FAULT at runtime
    return 0;
}

How to view .rodata in a file

If you want to see the .rodata section of a compiled binary on Linux, you can use several command-line tools:

  • Using readelf: To see the section headers and verify .rodata exists:

    readelf -S my_program | grep .rodata
    
  • Using objdump: To see the actual content (hex and ASCII) of the .rodata section:

    objdump -s -j .rodata my_program
    
  • Using nm: To see the symbols and which section they are assigned to (the R or r tag indicates read-only data):

    nm my_program | grep ' R '
    

What is PHDR?

PHDR stands for Program Header.

  • Section Headers: for the compiler and linker to organize the code.
  • Program Headers: for the Operating System Kernel to actually run the code.

The "Execution View"

An ELF file has two "views":

  1. Linking View (Sections): Used during build time. It focuses on things like .text, .data, and .symtab.
  2. Execution View (Segments): Used during runtime. The Program Header Table defines these Segments.

When you type ./my_program, the Linux kernel doesn't care about "sections." It looks at the Program Headers to understand how to map the file into RAM.

What information is in a PHDR?

Each entry in the Program Header table describes a Segment. It tells the kernel:

  • Type: What kind of segment is this? (Is it loadable? Is it for the dynamic linker?)
  • Offset: Where does this segment start in the file on the disk?
  • VirtAddr: What memory address should this segment be loaded into in RAM?
  • FileSiz / MemSiz: How big is it on disk vs. how big should it be in RAM? (If MemSiz is larger than FileSiz, the extra space is filled with zeros—this is how .bss is handled).
  • Flags: What are the permissions?
    • R (Read)
    • W (Write)
    • E (Execute)
  • Align: Memory alignment requirements.

Common Segment Types

When you inspect an ELF file's program headers, you will see these common types:

  • PT_LOAD: (PT=Program Table) This is the most critical type. It tells the kernel to map a piece of the file into memory. A typical program has at least two:
    • One for code (Flags: R E — Read/Execute).
    • One for data (Flags: RW — Read/Write).
  • PT_INTERP: This contains the string path to the dynamic linker (e.g., /lib64/ld-linux-x86-64.so.2). The kernel reads this to know which helper program is needed to load shared libraries.
  • PT_DYNAMIC: Points to the .dynamic section. This contains a list of all the shared libraries (.so files) the program needs to run.
  • PT_GNU_STACK: This is a security header. It tells the kernel whether the program's stack should be executable. (Almost always set to non-executable to prevent "Stack Overflow" exploits).
  • PT_PHDR: This header specifies the location and size of the Program Header Table itself in the memory image.

Segment vs. Section (The Mapping)

One Segment (in the PHDR) usually contains multiple Sections.

For example, a single PT_LOAD segment with R E (Read/Execute) permissions might contain:

  • .text (Your code)
  • .rodata (Constant strings)
  • .init (Initialization code)

The kernel doesn't want to map 20 different sections individually because it's inefficient. Instead, the linker groups sections with the same permissions into one large Segment described by a PHDR.

The Kernel doesn't actually need "Sections"

This is the most surprising part of ELF flexibility: Sections are optional for execution.

  • The Linker (at compile time) needs Sections to organize code and data.
  • The Kernel (at runtime) only needs Segments (defined in the Program Headers).

You can actually delete the entire "Section Header Table" from an ELF file (using strip), and the program will still run perfectly. The kernel just looks at the Program Headers, maps the memory blocks, and jumps to the entry point. The sections are just "scaffolding" used during construction.

How to see the PHDR

You use the readelf command with the -l (lowercase L, for "Layout" or "Linker") flag:

readelf -l /bin/ls

Example Output (simplified):

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000000040 0x0000000000000040
                 0x00000000000002d8 0x00000000000002d8  R      0x8
  INTERP         0x0000000000000318 0x0000000000000318 0x0000000000000318
                 0x000000000000001c 0x000000000000001c  R      0x1
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000003c60 0x0000000000003c60  R      0x1000
  LOAD           0x0000000000004000 0x0000000000004000 0x0000000000004000
                 0x0000000000013d31 0x0000000000013d31  R E    0x1000