logo

Kubernetes - Node

Node is the abstraction of a machine, which may mean different things in different environment:

  • in a bare metal cluster: node = a bare metal machine
  • in a cluster on a cloud provider e.g. GKE: node = a GCE VM.
  • in a kind cluster: node = a docker container

To mark a Node unschedulable, run:

$ kubectl cordon $NODENAME

Pods that are part of a DaemonSet tolerate being run on an unschedulable Node. DaemonSets typically provide node-local services that should run on the Node even if it is being drained of workload applications.

Capacity

Default is 110 pods per node. Pod limit is set and enforced by kubelet running on the node. Can be configured in kubelet's config (the flag --max-pods is deprecated; instead, change maxPods in the config file specified by --config)

Only Pods that have been assigned to a node, and are not yet terminated (Failed or Succeeded phase), are counted against this capacity.

To check the pod limit of a node:

$ kubectl get nodes <node_name> -o json | jq -r '.status.capacity.pods'

Check kubelet status:

$ systemctl status kubelet

If you see something like /usr/bin/kubelet --max-pods ..., the max pod is configured by --max-pods flag. The commandline flag may be changed in the kubelet service config, something like /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

Otherwise find the config file path, which may look like /usr/bin/kubelet --config=/var/lib/kubelet/config.yaml ....

To change the maxPods in config file:

$ yq -i e '.maxPods=500'  /var/lib/kubelet/config.yaml

Restart kubelet:

$ systemctl restart kubelet

npd

node-problem-detector aims to make various node problems visible to the upstream layers in the cluster management stack. It is a daemon that runs on each node, detects node problems and reports them to apiserver. node-problem-detector can either run as a DaemonSet or run standalone. Now it is running as a Kubernetes Addon enabled by default in the GKE cluster. It is also enabled by default in AKS as part of the AKS Linux Extension.

npd (node-problem-detector) uses crictl pods --latest to determine if containerd is healthy. If not, npd will constantly restart it.

Control Plane Nodes

Control plane nodes may have node-role.kubernetes.io/control-plane:NoSchedule or PreferNoSchedule taint:

spec:
  taints:
    - effect: PreferNoSchedule
      key: node-role.kubernetes.io/control-plane

Pods need to tolerate node-role.kubernetes.io/control-plane in order to run on control plane nodes:

tolerations:
  - key: node-role.kubernetes.io/control-plane
    operator: Exists

To remove the taint:

$ kubectl taint nodes node1 node-role.kubernetes.io/control-plane:NoSchedule-