logo

Kubernetes

Last Updated: 2023-01-29

Kubernetes is winning the container orchestration war.

Orchestration tools: handle containers running stateless applications. The applications may be terminated at any time, and / or restarted from a different machine. (which means production db should not run in containers.)

  • Kuberenetes
  • Mesos
  • Nomad

Kubernetes: The Documentary

Kubernetes Applications

  • Stateless applications: trivial to scale, with no coordination. These can take advantage of Kubernetes deployments directly and work great behind Kubernetes Services or Ingress Services.
  • Stateful applications: postgres, mysql, etc which generally exist as single processes and persist to disks. These systems generally should be pinned to a single machine and use a single Kubernetes persistent disk. These systems can be served by static configuration of pods, persistent disks, etc or utilize StatefulSets.
  • Static distributed applications: zookeeper, cassandra, etc which are hard to reconfigure at runtime but do replicate data around for data safety. These systems have configuration files that are hard to update consistently and are well-served by StatefulSets.
  • Clustered applications: etcd, redis, prometheus, vitess, rethinkdb, etc are built for dynamic reconfiguration and modern infrastructure where things are often changing. They have APIs to reconfigure members in the cluster and just need glue to be operated natively seemlessly on Kubernetes, and thus the Kubernetes Operator concept

Kubernetes vs OpenStack

Openstack was launched in 2010. AWS was the only Cloud, GCP didn't exist, Docker was not a thing. The goal was to provide an open source and private alternative to AWS; building on top of VMs.

Kubernetees was launched in 2014. AWS, Azure, GCP became dominant players of Cloud computing, Docker became the synonym of container. The goal was to be a bridge among the big 3, and between public cloud and private data centers; building on top of containers.

OpenStack is on the downtrend.

Kubernetes vs Nomad

  • Kubernetes aims to provide all the features needed to run Linux container-based applications including cluster management, scheduling, service discovery, monitoring, secrets management and more.
  • Nomad only aims to focus on cluster management and scheduling and is designed with the Unix philosophy of having a small scope while composing with tools like Consul for service discovery/service mesh and Vault for secret management.

Summary

2 key concepts: resources + controllers

each resource has a controller monitoring it; except that ConfigMap just store stuff, no controllers (ConfigMap does not spec and status field, but in data)

e.g. deployment controller watches deployment resources.

Hierarchy: cluster -> namespace -> node -> pod -> container

(a Node = a kubelet running)

X as Code

In kubernetes, everything is a code. Git repository which should act as single source of truth.

Naming

The K8S group has a tradition of using Greek names

  • Kubernetes (κυβερνήτης): helmsman or pilot
  • Istio (ιστίο): sail
  • Anthos (ἄνθος): flower

K8s Native

Native = using KRM apis

Standards

  • Container Runtime Interface (CRI):the main protocol for the communication between the kubelet and Container Runtime.
  • Container Storage Interface (CSI)
  • Container Network Interface (CNI)

Components

Control Plane Components

("master" components)

  • kube-apiserver: API Server
  • controllers:
    • kube-controller-manager: Controller Manager
    • cloud-controller-manager
  • kube-scheduler: Scheduler
  • etcd
  • add-ons:
    • kube-dns, dashboard, monitoring, cluster-level logging
    • keepalived & haproxy: The battle tested duo will provide the control plane discovery and load balancing out of the box.

Worker Node Components

(virtual or physical machines, managed by the control plane and contains the services necessary to run Pods.)

  • kubelet: Talks to API Server.
  • kube-proxy
  • Container Runtime: e.g. containerd, a daemon on worker nodes. Manages the container lifecycle.
  • monitoring / logging: supervisord, fluentd

The Pod Lifecycle Event Generator or PLEG is a daemon on each node that ensures the cluster's current state matches the desired state of the cluster. This might mean restarting containers or scaling the number of replicas but its possible for it to encounter issues.

The kubelet monitors resources like memory, disk space, and filesystem inodes on your cluster's nodes.

Clients

Clients use kubectl cli to interact with the cluster.

Containerized or not

  • Containerized (can be found in kubectl get service -A): kube-apiserver, kube-scheduler, kube-proxy, etcd, etc.
  • Not containerized (run as systemd services): kubelet, containerd, docker.

API Server

API Server clients: CLI (kubectl), CI/CD (Jenkins), Dashboard / UI, kubelet, control plane components (controller-manager, scheduler, etc)

  • clients wihin Control Plane: controllers, scheduler, etcd.
  • between API Server and developers: kubectl, kubeadm, REST API, client libraries (https://github.com/kubernetes-client)
  • between API Server and Nodes: kubelet

Access management:

authentication -> authorization -> admission control ("mutating" / "validating" admission controllerss)

the API server implements a push-based notification stream of state changes (events), also known as Watch

One of the reasons why watches are so efficient is because they’re implemented via the gRPC streaming APIs.

Scheduler

The scheduler is a kind of controller. why separate from controler manager? big enough; easy to use an alternative scheduler

Specs

  • Container Runtime Interface (CRI):
  • CNI

Namespace

  • Namespace-based scoping is applicable only for namespaced objects (e.g. Deployments, Services, pods, services, replication controllers, etc) and not for cluster-wide objects (e.g. StorageClass, Nodes, PersistentVolumes, etc). namespace resources are not themselves in a namespace. To get all the namespaces: kubectl get namespace

Controllers

controllers (pieces of Go code) live in a controller-manager (a binary / container)

Controller pattern: Controllers typically read an object's .spec, possibly do things, and then update the object's .status

controllers are clients that call into the API server (i.e. API server does not know who or where the controllers are, they are NOT registered, unlike webhooks)

e.g. Job controller, tells API server to create or remove Pods. Other examples: replication controller, endpoints controller, namespace controller, and serviceaccounts controller. built-in controllers that run inside the kube-controller-manager.

Configuration Management

manifest = yaml

  • plain (e.g. CRDs, deployment/service with no variables but only hard coded values, configmaps)
    • kubectl apply (apply the manifests, create/update resources)
  • Helm: templates + values => yaml, good for yamls you fully own
    • helm install
  • kustomize: literal yaml + patches (does not use templates) good for yamls you do not own
    • kubectl apply -k

Helm: A package manager for Kubernetes that uses Charts, which are Go-based templates that ultimately generate YAML-based manifests for deployments. you can use a simple command like helm install prometheus prometheus-community/prometheus to deploy a functional monitoring agent (and all its requisite resources) on your existing cluster without writing a single line of YAML.

To check history of a chart:

$ helm history <chart> -n <namespace> --kubeconfig /path/to/kubeconfig

If something goes wrong, rollback manually:

$ helm rollback <chart> -n <namespace> --kubeconfig /path/to/kubeconfig

Kustomize: A configuration management tool for customizing Kubernetes objects. With Kustomize, you can take an existing manifest and apply overrides without touching the original YAML file.

Life of a deployment

  • user submit a deployment.yaml to API Server
  • deployment.yaml is stored in etcd; only API Server can access etcd
  • controller-manager sees the deloyment.yaml and create corresponding pods
  • scheduler: assigns a pod to a node.
  • kubelet talks to the API Server and read the schedule, runs the pods
  • end users calls the running pods through kube-proxy (kube-proxy calls API Server to get services)

APIs

API Conventions: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md

  • core group:
    • REST Path: /api/v1
    • apiVersion: "core" is skipped, i.e. apiVersion: v1.
  • exntions:
    • REST Path: /apis/$GROUP_NAME/$VERSION
    • apiVersion: $GROUP_NAME/$VERSION, e.g. apiVersion: batch/v1

The /api endpoint is already legacy and used only for core resources (pods, secrets, configmaps, etc.). A more modern and generic /apis/<group-name> endpoint is used for the rest of resources, including user-defined custom resources.

API Groups

  • core: v1
  • apps/v1, batch/v1, policy/v1, autoscaling/v2
  • k8s.io: app.k8s.io, metrics.k8s.io, networking.k8s.io, node.k8s.io, storage.k8s.io, rbac.authorization.k8s.io
  • x-k8s.io: cluster.x-k8s.io

Add-ons

  • add-on examples: CoreDNS, Dashboard, etc.
  • add-ons can be in the form of Kubernetes Operators.
  • add-ons can be installed by helm.

Cert Manager

cert-manager adds certificates and certificate issuers as resource types in Kubernetes clusters, and simplifies the process of obtaining, renewing and using those certificates.

https://github.com/cert-manager/cert-manager

Service Accounts

  • each pod is assigned a ServiceAccount by default. A default secret token is mounted on every pod's file system.
  • each pod gets a Secret volume automatically mounted.

Learning Resources

SIGS

Cluster API

https://github.com/kubernetes-sigs/cluster-api

Provisioning, upgrading, and operating multiple Kubernetes clusters.

kubeadm is built-in.

Book: https://cluster-api.sigs.k8s.io/

Admin clusters

  • provides a Kubernetes Resource Model (KRM) API for managing the lifecycle of multiple user clusters.
  • provides a single location to store/cache common policies for multiple user clusters.

In ABM, admin clusters run on-premises. To support edge deployments with limited resource footprints, the admin cluster can run remotely in a different datacenter or region, or a public cloud.

Upgrade

kubeadm upgrade

Use kubeadm upgrade to upgrade. The upgrade procedure on control plane nodes and worker nodes should be executed one node at a time.

If kubeadm upgrade fails and does not roll back, for example because of an unexpected shutdown during execution, you can run kubeadm upgrade again. This command is idempotent and eventually makes sure that the actual state is the desired state you declare.

kubeadm manages the lifecycles of the components like kube-apiserver, kube-scheduler-controller,kube-controller-manager, etcd, kubelet.s

Drain and undrain

Use kubectl drain to safely evict all of the pods from a node before you perform maintenance on the node (e.g. kernel upgrade, hardware maintenance, etc.). Alternatively can call eviction API.

To undrain:

$ kubectl uncordon <node name>

In Tree

"in-tree" meaning their code was part of the core Kubernetes code and shipped with the core Kubernetes binaries.

Access Control

  • Namespaces: segment pods by application or work group, support multi-tenancy.
  • RBAC: assign roles to users for specific namespaces.

Bootstrapping

Example:

  • create a KIND cluster
  • use the KIND cluster to bootstrap an admin cluster
  • admin cluster manages the control planes for tenant admin cluster
  • tenant cluster further manages tenant user clusters

Registry depends on storage, storage depends on registry

  • option 1: ordering when setting up the clusters (registry in systemd with a subset of images)
  • option 2: kind boostrapping cluster + pivot s

Bootstrap the Bootstrapper

Use a machine (or VM) as the bootstrapper, install OS and necessary tools.

Kind Cluster

  • Spin up a Kind cluster
  • Kind cluster - install Cluster API and other controllers
  • Kind cluster - create admin cluster
  • Pivot the cluster lifecycle resources into admin cluster

To list kind clusters:

kind get clusters

To delete a cluster by name:

kind delete cluster --name $name

To get kind cluster kubeconfig

kind get kubeconfig --name $name > ~/.kube/config

Pivot

Pivoting: moving objects from the ephemeral k8s cluster (the Kind cluster) to a target cluster (the newly created admin cluster).

The process:

  • Pause any reconciliation of objects.
  • Once all the objects are paused, the objects are created on the other side on the target cluster and deleted from the ephemeral cluster.

Delete Kind cluster.

https://cluster-api.sigs.k8s.io/clusterctl/commands/move.html#bootstrap--pivot

Create User Clusters (If Multi-tenency)

  • Admin cluster - create user clusters

Static Pods

Static Pods are defined in /etc/kubernetes/manifests (When installing Kubernetes with the kubeadm tool.)

Static Pods are managed directly by the kubelet daemon on a specific node, without the API server observing them. I.e.kubelet watches /etc/kubernetes/manifests.

Static Pods are under namepace kube-system.

To check kubelet status: systemctl status kubelet

To check kubelet logs: journalctl -u kubelet

To check static pods logs:

crictl ps
crictl logs <container>

The kubelet automatically creates a mirror pod on the api-server for each static pod. This means that the pods running on a node are visible on the API server, but cannot be controlled from there.

To check the mirror Pods on the API server:

kubectl get pods

How to force restart a pod

kubectl get pod PODNAME -n NAMESPACE -o yaml | kubectl replace --force -f -

How to debug

With kubectl:

kubectl get pod
kubectl get event
kubectl logs

check admin node

  • /etc/containerd/config.toml for container configs
  • /etc/kubernetes/manifests for static pod manifests
  • crictl logs for static pod logs
  • journalctl -u kubelet for kubelet logs

Change verbosity level:

/var/lib/kubelet/config.yaml

logging.verbosity => 4

Security

Secure computing mode (seccomp): Any system calls not on the list are disallowed.

Check status

kubectl get --raw='/readyz?verbose'

Logging

klog:

Containerized-Data-Importer (CDI)

Containerized-Data-Importer (CDI) is a persistent storage management add-on for Kubernetes. It's primary goal is to provide a declarative way to build Virtual Machine Disks on PVCs for Kubevirt VM.

CDI provides the ability to populate PVCs with VM images or other data upon creation.

Harbor

API:

goharbor.io/v1beta1
HarborCluster
  • cache: Redis
  • database: PostgreSQL
  • storage: FileSystem

Registry

  • Docker Hub: a public regitry https://hub.docker.com/
  • Harbor: an open source option. Deployed in cluster.
  • From Cloud Providers, e.g. Google Container Registry / Artifact Registry.

Registry API

Docker Registry v2 API is a well known specification for any Docker registry (docker.io, gcr.io, Harbor etc.). Any time you’re doing docker pull (or other Docker commands), the Docker client is interacting with the Docker registry using a protocol defined in Docker Registry v2 API specification.

OCI Distribution Specification is an effort to standardize the Docker Registry v2 API, which is largely the same as Docker Registry v2 API (intentional). Most of the Docker registry on the market nowadays supports it in addition to the Docker Registry v2 API support. https://github.com/opencontainers/distribution-spec

eBPF vs sidecar

eBPF: makes the kernel programmable. Write programmers in kernel triggered by events. Similar to JS that allows us to dynamically change the behavior of a web page

  • a sidecar has a view across just one pod (per pod), part of the app configuration (my-app.yaml)
  • eBPF does not need any app config change (they live in the kernel
  • eBPF is triggered by events, regardless of whether the pod is running or not
  • eBPF can see ALL activities on the node (all pods on this node)
  • sidecar: does not need access to the node
  • eBPF is kernel programming, sidecar is easier to w

eBPF observability tool: Pixie, Cilium Hubble.

Cilium uses eBPF to provide efficient networking. connectivity; sidecarless service mesh (Istio ambient mesh, also sidecarless).

Dev Lifecycle

Code - (Build) -> Container -> (Push) -> Registry -> (Deploy) -> K8s Cluster

Code Structure Convention

  • pkg/apis: define types; using kubebuilder
  • pkg/controllers: define logic

IAM / IdP

According to CloudFlare: An identity provider (IdP) is a service that stores and verifies user identity. IdPs are typically cloud-hosted services, and they often work with single sign-on (SSO) providers to authenticate users.

Red Hat Single Sign-On (the commercial version of Keycloak).

OIDC: OpenID Connect https://openid.net/connect/

Databases

postgres: https://github.com/zalando/spilo

Billing an Provisioning

The Account Tracking and Automation Tool (ATAT) is a cloud provisioning tool developed by the Hosting and Compute Center of the Defense Information Systems Agency (DISA). ATAT is responsible for two major areas of functionality:

  • Provisioning: provisions resources in a cloud environment.
  • Billing: reports historical and forecasted cost information.

Notes

Most of the kubernetes components are stateless and state of each component comes from the etcd db files.