Linux - Page Tables
In the context of the Linux kernel, PGD stands for Page Global Directory.
It is the top level of the multi-level page table system used by the Linux kernel to translate Virtual Memory addresses (used by applications) into Physical Memory addresses (actual RAM chips).
The Big Picture: Why do we need PGD?
Every process in Linux thinks it has access to a massive, continuous block of memory (e.g., 128 Terabytes on a 64-bit system). In reality, the physical RAM is fragmented and shared between hundreds of processes.
To manage this, the CPU and Kernel use Paging. Instead of one giant list of memory mappings (which would take up gigabytes of RAM), Linux uses a hierarchy (like a folder structure). The PGD is the "Root Folder" of that hierarchy.
Where PGD sits in the Hierarchy
On a modern 64-bit system, Linux typically uses a 4-level or 5-level paging system. The translation flows like this:
- PGD (Page Global Directory) — The Top Level
- P4D (Page 4th Level Directory) — (Used in 5-level paging)
- PUD (Page Upper Directory)
- PMD (Page Middle Directory)
- PTE (Page Table Entry) — The Bottom Level (points to the actual page of RAM)
How the Translation Works
When a CPU wants to read a virtual memory address, it breaks that address into segments. It uses each segment as an index to find its way down the tree:
- The CPU looks at the first few bits of the address and uses them as an index into the PGD.
- The entry in the PGD tells the CPU the physical address of the next table down (PUD).
- The CPU keeps "walking the bits" until it reaches the PTE, which finally gives the address of the actual data in RAM.
One PGD per Process
Each process in Linux has its own unique PGD. This is the secret to Process Isolation:
- When the kernel switches from Process A to Process B (a context switch), it updates a special CPU register (on x86, this is the CR3 register) with the physical address of the new process's PGD.
- Because Process B has a different PGD, it literally cannot see the memory of Process A. Its "root folder" leads to entirely different physical RAM pages.
Implementation in Code
In the Linux Kernel source code:
- The PGD is stored in the
mm_struct(Memory Management struct) associated with every process. - You will see a pointer:
pgd_t *pgd;. - The kernel uses macros like
pgd_offset(mm, address)to find which entry in the PGD corresponds to a specific virtual address.
"Folding" (Architecture Independence)
Not all CPUs have 4 or 5 levels of hardware paging. Some older or simpler CPUs only have 2 levels.
Linux is designed to be portable, so it uses a "Generic" paging model. On hardware that only supports 2 levels, Linux "folds" the middle levels (PUD and PMD). To the rest of the kernel code, it looks like there are 4 levels, but the "jumps" through the middle levels happen instantly (they basically just point back to the PGD).