containerd - Snapshotter

In containerd, a Snapshotter is the component responsible for managing container image layers and the writable filesystems of running containers.

To understand why they exist, you have to look at how container images are built. Images aren't just one big file; they are a stack of read-only layers. The Snapshotter is the "librarian" that manages how those layers are stacked, stored, and turned into a real directory the container can use.

Why do we need it? (The Layer Problem)

When you pull an image (like nginx), you are downloading several compressed tarballs (layers).

Layer 1: Base OS (e.g., Debian)
Layer 2: Dependencies
Layer 3: Nginx binary

You cannot run a program directly inside a bunch of tarballs. You need a unified filesystem. Furthermore, container layers are read-only. When a container runs, it needs to be able to write files (logs, temp files) without changing the original image.

The Snapshotter solves this using Snapshots.

How it Works: The Snapshot Chain

A Snapshotter works with three types of snapshots:

Committed Snapshot: A read-only layer. These represent the layers of your image. Once they are "committed," they can never change.
Active Snapshot: A writable layer created when you start a container. This is where your application writes its data.
View: A read-only "mount" of a snapshot, usually used for inspecting an image.

The Workflow:

containerd pulls an image.
The Snapshotter unzips the first layer into a Committed Snapshot.
The second layer is unzipped "on top" of the first, creating a new Committed Snapshot that points to the first as its Parent.
When you start the container, the Snapshotter creates an Active Snapshot. It uses Copy-on-Write (CoW) logic, so it only stores the changes you make during the container's life.

Common Snapshotter Types

Because Linux has many different filesystems, containerd has multiple snapshotter plugins:

overlayfs (The Default): The most common and efficient for modern Linux. It uses the Kernel's overlay driver to "overlap" directories. It’s very fast because it happens at the VFS level.
btrfs / zfs: If your host machine is formatted with Btrfs or ZFS, these snapshotters use the filesystem's built-in "atomic snapshot" features.
devmapper: Uses thin-provisioned LVM volumes. Historically used by Docker, now less common but still used in specific secure/enterprise environments.
native: A simple snapshotter that just copies files. It is very slow and uses a lot of disk space because it doesn't use Copy-on-Write (it physically copies the whole layer).

The "Expert" Level: Remote Snapshotters (Lazy Pulling)

Standard snapshotters require you to download 100% of the image before the container starts. This is a bottleneck for large AI or Java images.

Remote Snapshotters (like stargz or nydus) are the high-performance evolution. They allow a container to start before the image is finished downloading.

The snapshotter mounts a "virtual" filesystem.
When the container tries to read an ELF binary that hasn't been downloaded yet, the remote snapshotter fetches only that specific block of data from the registry over the network.
This can reduce container startup time from 30 seconds to 2 seconds.

How to see it in action

You can use ctr (the containerd CLI tool) to see the snapshots on your system:

# List all active snapshotters
sudo ctr plugins ls | grep snapshotter

# List snapshots currently in use
sudo ctr snapshots --snapshotter overlayfs ls

What does "commit" mean, technically

Technically, in containerd, "Commit" is the specific transition of a snapshot from a writable state to an immutable (read-only) state.

Think of it like a Git commit or "finalizing" a concrete mold. Once you commit a snapshot, it becomes a permanent, named layer that can never be changed again.

The State Transition: From "Active" to "Committed"

When containerd is pulling an image or building a new layer, the snapshotter follows a strict lifecycle:

Prepare: The snapshotter creates an Active Snapshot. This is a writable directory (or block device).
The Work: The container engine extracts a .tar.gz layer into that directory. At this point, the snapshot is "dirty"—it’s being modified.
Commit: Once the extraction is finished and the files are all there, containerd calls the Commit() function.

What "Commit" does under the hood

When the Commit() function is triggered, the Snapshotter performs three technical tasks:

Immutability (The Lockdown)

The snapshotter marks the entire directory or subvolume as Read-Only.

In overlayfs, it might just be a metadata flag in containerd's database.
In btrfs or zfs, it triggers a literal filesystem "Read-Only" property on the subvolume.
Why? This ensures that if 10 different containers use this layer, none of them can accidentally (or maliciously) modify the underlying files.

Identity Assignment (The Hash)

While a snapshot is Active, it is usually referred to by a temporary "key" (a random string). When you Commit, it is given its permanent Name (usually the ChainID or a cryptographic hash of the content). This is how containerd knows: "I already have the Ubuntu Base Layer; I don't need to download it again."

Metadata Finalization

The snapshotter records the Parent Relationship in its database (usually a BoltDB file at /var/lib/containerd/io.containerd.metadata.v1.bolt/meta.db).

It logs: "Snapshot X is the child of Snapshot Y."
Once committed, this relationship is set in stone. You cannot change the parent of a committed snapshot.

Why can't we just use Active Snapshots?

You might wonder: "Why bother committing? Why not just use the writable directory?"

The reason is Deduplication and Branching.

You cannot create a "Child" snapshot from an "Active" (writable) parent.
Technically: If the parent is still changing, the child’s "diff" (the changes) would be calculated against a moving target, leading to data corruption.
By Committing, you create a "Frozen Base." Multiple children can now branch off that same frozen base simultaneously.

A Concrete Example: OverlayFS

If you are using the overlayfs snapshotter:

Active: containerd creates a directory in /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/<ID>/. You can cd into it and see the files.
Commit: containerd calls Commit.
- It updates the internal metadata to say this ID is now "Committed."
- It checks the files to ensure everything is consistent.
Usage: The next time a container starts, it uses that directory as a LowerDir (a read-only layer in OverlayFS parlance).

Image vs Snapshot

The key difference between a snapshot and an image lies in how they are stored:

a snapshot is an unpacked directory on a local disk used by a runtime.
an image is a standardized, portable bundle (usually a .tar file) that includes metadata and configuration.

Comparison

Feature	Image (The "Logical" View)	Snapshot (The "Physical" View)
What is it?	A portable, versioned "package."	A specific state of a filesystem on disk.
Composition	A Manifest (JSON) + Layers (Tarballs).	A directory or subvolume in a Snapshotter.
Portability	High (Can be pushed to a Registry).	Low (Specific to that host's disk).
Role	The Blueprint.	The Bricks used to build the house.

Are these concepts of Containers, `containerd`, or Linux?

It is a hierarchy. Each layer "wraps" the one below it.

A. Linux (The Primitives)

Linux doesn't really have a native concept of a "Container Image." It has Filesystems and Drivers (like OverlayFS, XFS, Btrfs). Linux provides the ability to "stack" directories or "clone" blocks. This is the raw material.

B. `containerd` (The Snapshotter)

containerd is the bridge. It takes the raw Linux primitives and organizes them into Snapshots.

When containerd pulls an image, it extracts the data into snapshots.
A Snapshot is a containerd object that manages a specific Linux directory.
Note: containerd doesn't care about "versions" or "tags" (like nginx:latest); it only cares about "Parent Snapshot ID" and "Child Snapshot ID."

C. Containers / OCI (The Image)

The Image is a higher-level standard (managed by the OCI - Open Container Initiative). An Image is a collection of metadata that tells containerd: "To create this environment, you need to stack Snapshot A, then Snapshot B, then Snapshot C, and then run the ELF binary located at /bin/nginx."

How an Image becomes Snapshots

Think of an Image like a Lego Instruction Manual, and Snapshots like the Lego Bricks already snapped together in your drawer.

The Registry: An image exists on a server (like Docker Hub) as a "Manifest" (a JSON file) and a bunch of "Layers" (compressed .tar.gz files).
The Pull: When you pull the image, containerd downloads the tarballs.
The Unpacking (The Snapshotter): containerd asks the Snapshotter to create a chain of snapshots.
- It unpacks Layer 1 into Snapshot 1.
- It Commits Snapshot 1 (making it read-only).
- It unpacks Layer 2 into Snapshot 2 (using Snapshot 1 as a parent).
The Result: You now have an Image (the logical name) which is physically represented on your disk as a Chain of Snapshots.

The Relationship Diagram

Image (OCI Concept): my-app:v1
- Points to... Manifest JSON (Config, Env vars, Entrypoint)
- Points to... Layer Hashes (sha256
  ...)
Snapshotter (containerd Concept):
- Snapshot A (Base Layer)
- Snapshot B (App Code, Parent=A)
- Active Snapshot C (Writable layer for the running container, Parent=B)
VFS / Filesystem (Linux Concept):
- OverlayFS mounting snapshots/A and snapshots/B as lowerdirs and snapshots/C as upperdir.

Summary

Snapshotter = The "Layer Manager" of containerd.
It turns Static Layers (on disk) into a Unified Rootfs (in RAM/Mounts).
It utilizes Linux Kernel features like OverlayFS or CoW to make containers start instantly without duplicating huge amounts of data.