logo

Kubernetes - cgroup

How does a container in k8s appear in the cgroup tree?

In Kubernetes, the cgroup tree is organized to reflect the hierarchy of the node. It isn't just a flat list of containers; it is a nested structure that allows Kubernetes to enforce resource limits at the Container, Pod, and Node levels simultaneously.

The root of this tree is a directory called kubepods.

The High-Level Hierarchy (QoS Classes)

Kubernetes groups Pods into three "Quality of Service" (QoS) classes based on their resource requests and limits. The cgroup tree is split accordingly to help the Linux kernel prioritize resources.

The structure under /sys/fs/cgroup/ looks like this:

  1. Guaranteed: Pods where requests == limits. These are placed directly under the kubepods root because they are the highest priority.
  2. Burstable: Pods where requests < limits. These are placed in a sub-directory: kubepods/burstable/.
  3. BestEffort: Pods with no requests or limits. These are placed in kubepods/besteffort/.

Why this matters: If the node runs out of memory, the kernel looks at the besteffort cgroup first to find processes to kill.

The Path to a Container

A container’s full path in the cgroup tree follows this logic: Root -> QoS Level -> Pod Level -> Container Level

A. If using the cgroupfs driver:

The paths are literal directory names based on IDs.

  • Path: /sys/fs/cgroup/memory/kubepods/burstable/pod<POD-UID>/<CONTAINER-ID>

B. If using the systemd driver (The modern standard):

Systemd uses a "slice" and "scope" naming convention. It converts the IDs into a format it can manage as units.

  • Path: /sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod<POD-UID>.slice/cri-container-<ID>.scope

Visualizing the Tree

If you have a Burstable Pod with two containers, the tree (using the systemd driver) looks like this:

kubepods.slice (The whole K8s resource pool)
└── kubepods-burstable.slice (All Burstable pods)
    └── kubepods-burstable-pod<POD_UID>.slice (The specific Pod)
        ├── cri-container-<ID_1>.scope (Container 1)
        └── cri-container-<ID_2>.scope (Container 2)

What is inside these directories?

In each of these folders (at any level), you will find the control files that enforce the limits you set in your YAML:

  • cpu.shares (v1) / cpu.weight (v2): Set based on your resources.requests.cpu.
  • cpu.cfs_quota_us: Set based on your resources.limits.cpu.
  • memory.limit_in_bytes: Set based on your resources.limits.memory.

The "Inception" of Limits:

  • The Container cgroup enforces the limit for that specific app.
  • The Pod cgroup enforces the total limit for all containers in the pod (plus the pause container).
  • The Burstable cgroup enforces the limit for all burstable pods on the node.

How to find your Pod in the tree

If you want to manually inspect a Pod's cgroups on a Linux node:

  1. Get the Pod UID:

    kubectl get pod <pod-name> -o jsonpath='{.metadata.uid}'
    

    (Note: Kubernetes replaces dashes - with underscores _ in the cgroup path if using systemd).

  2. Find the directory on the node:

    # For a Burstable pod on a systemd node:
    cd /sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/
    ls | grep <first-few-chars-of-UID>
    
  3. Check the memory limit:

    cat memory.limit_in_bytes