Kubernetes
Why Kubernetes?
Kubernetes is winning the container orchestration war.
Orchestration tools: handle containers, which may be terminated at any time, and / or restarted from a different machine. E.g. Kuberenetes, Mesos, Nomad, etc.
Kubernetes vs OpenStack
Openstack was launched in 2010. AWS was the only Cloud, GCP didn't exist, Docker was not a thing. The goal was to provide an open source and private alternative to AWS; building on top of VMs.
Kubernetees was launched in 2014. AWS, Azure, GCP became dominant players of Cloud computing, Docker became the synonym of container. The goal was to be a bridge among the big 3, and between public cloud and private data centers; building on top of containers.
OpenStack is on the downtrend.
Kubernetes vs Nomad
- Kubernetes aims to provide all the features needed to run Linux container-based applications including cluster management, scheduling, service discovery, monitoring, secrets management and more.
- Nomad focuses on cluster management and scheduling and is designed with the Unix philosophy of having a small scope while composing with tools like Consul for service discovery / service mesh and Vault for secret management.
Kubernetes Applications
- Stateless applications: trivial to scale, with no coordination. These can take advantage of Kubernetes
Deployment
s directly and work great behind KubernetesService
s. - Stateful applications: postgres, mysql, etc, which generally exist as single processes and persist to disks. These systems generally should be pinned to a single machine and use a single Kubernetes persistent disk. These systems can be served by static configuration of pods, persistent disks, or utilize
StatefulSet
s. - Static distributed applications: zookeeper, cassandra, etc which are hard to reconfigure at runtime but do replicate data around for data safety. These systems have configuration files that are hard to update consistently and are well-served by
StatefulSet
s. - Clustered applications: etcd, redis, prometheus, vitess, rethinkdb, etc are built for dynamic reconfiguration and modern infrastructure where things are often changing. They have APIs to reconfigure members in the cluster and just need glue to be operated natively seemlessly on Kubernetes, and thus the Kubernetes Operator concept.
How does Kubernetes work?
K8s = Resources + Controllers
2 key concepts in Kubernetes: Resources + Controllers
- A resource is an endpoint in the Kubernetes API that stores a collection of API objects of a certain kind.
- Built-in Resources: e.g.
Pod
,Service
,Job
, etc. - Custom Resources: extensions of the Kubernetes API.
- Defined in CRDs (Customer Resources Definition). The Go data models (as Go structs) are made into Kubernetes manifests at build time that are applied to the Kubernetes clusters.
- The CRDs then become available as APIs on the Kubernetes cluster.
- Built-in Resources: e.g.
- Each resource has reconcilers monitoring it. The reconcilers are then packaged into a larger controller.
- Controller pattern: Controllers typically read an object's
.spec
, possibly do things, and then update the object's.status
. - Controllers are clients that call into the API server (i.e. API server does not know who or where the controllers are, they are NOT registered, unlike webhooks).
- e.g.
- Deployment controller watches
Deployment
resources. - Job controller, tells API server to create or remove Pods.
- Other examples: replication controller, endpoints controller, namespace controller, and serviceaccounts controller.
- Deployment controller watches
- Controller pattern: Controllers typically read an object's
- Controllers (Go code) live in a controller-manager (a binary / container)
- built-in controllers that run inside the
kube-controller-manager
.
- built-in controllers that run inside the
Note: not all resources have controllers monitoring them, e.g. one exception is ConfigMap
, it just stores stuff (ConfigMap
does not spec
and status
field, but a data
field)
Hierarchy
cluster -> namespace -> node -> pod -> container
(a Node = a machine with a kubelet
running)
X as Code
In kubernetes, everything is a code. Git repository which should act as single source of truth.
Naming
The K8S group has a tradition of using Greek names
- Kubernetes (κυβερνήτης): helmsman or pilot.
- Istio (ιστίο): sail.
- Anthos (ἄνθος): flower.
Standards
- Container Runtime Interface (CRI): for the communication between the kubelet and Container Runtime.
- Container Storage Interface (CSI)
- Container Network Interface (CNI)
- OCI Image Spec
- OCI Runtime Spec
- OCI Distribution Spec: for talking to artifact registries (like Harbor or Docker Registry).
Scopes
- Cluster level.
- Namespaced: Namespace-based scoping is applicable only for namespaced objects (e.g.
Deployment
s,Service
s,Pod
s,Service
s, replication controllers, etc) and not for cluster-wide objects (e.g.StorageClass
,Node
s,PersistentVolume
s, etc).Namespace
resources are not themselves in a namespace. To get all the namespaces:kubectl get namespace
.
Configuration Management
"manifest" = yaml files.
- Plain:
- e.g. CRDs, deployment/service with no variables but only hard coded values, configmaps.
kubectl apply
(apply the manifests, create/update resources)
- Helm:
- templates + values => yaml
- uses Charts, which are Go-based templates that ultimately generate YAML-based manifests for deployments.
- Good for yamls you fully own.
helm install
- Helm Cheatsheet
- Kustomize:
- literal yaml + patches
- take an existing manifest and apply overrides without touching the original YAML file; does not use templates.
- Good for yamls you do not own.
kubectl apply -k
Code repos
- in-tree:
github.com/kubernetes/kubernetes
- other repos in
github.com/kubernetes
github.com/kubernetes/apimachinery
github.com/kubernetes/minikube
github.com/kubernetes/client-go
- repos in
github.com/kubernetes-sig
github.com/kubernetes-sigs/kind
github.com/kubernetes-sigs/kustomize
github.com/kubernetes-sigs/cluster-api
github.com/kubernetes-sigs/gateway-api
github.com/kubernetes-sigs/kubebuilder
Add-ons
- add-on examples: CoreDNS, Dashboards (e.g. Grafana), etc.
- add-ons can be in the form of Kubernetes Operators.
- add-ons can be installed by Helm.
Multi-tenancy / Admin clusters
Benefits of using admin clusters:
- provides a Kubernetes Resource Model (KRM) API for managing the lifecycle (bootstrap, upgrade, update configs / policies, deletion) of multiple user clusters. (Otherwise you need some non-standard way to manage the clusters, like manually and running scripts.)
- provides a single location to store / cache common policies for multiple user clusters.
In ABM, admin clusters run on-premises. To support edge deployments with limited resource footprints, the admin cluster can run remotely in a different datacenter or region, or a public cloud.
Terminologies
- "in-tree": meaning their code was part of the core Kubernetes code and shipped with the core Kubernetes binaries.
- KRM: the Kubernetes Resource Model. The declarative format you use to talk to the Kubernetes API. Basically the yaml file you see and interact with (the yaml with
apiVersion
,kind
,metadata
,spec
, andstatus
) - K8s Native: Native = using KRM apis.
How to debug
With kubectl
:
kubectl get pod
kubectl get event
kubectl logs
check admin node
- Check config files listed below.
crictl logs
for static pod logsjournalctl -u kubelet
for kubelet logs
Change verbosity level:
/var/lib/kubelet/config.yaml
logging.verbosity
=> 4
How to add logs?
klog:
- https://github.com/kubernetes/klog
- a fork of glog
Files
/etc/kubernetes/admin.conf
: kubeconfig./etc/kubernetes/manifests/
: static pods./etc/kubernetes/pki
: stores certificates (if you install Kubernetes withkubeadm
)./etc/kubernetes/pki/apiserver.crt
: apiserver cert./var/lib/kubelet/config.yaml
kubelet config./etc/containerd/config.toml
for container configs.
Dev Lifecycle
Code - (Build) -> Container Images -> (Push) -> Registry -> (Deploy) -> K8s Cluster
Code Structure Convention
pkg/apis
: define types; using kubebuilderpkg/controllers
: define logic
Databases
postgres: https://github.com/zalando/spilo
Billing an Provisioning
The Account Tracking and Automation Tool (ATAT) is a cloud provisioning tool developed by the Hosting and Compute Center of the Defense Information Systems Agency (DISA). ATAT is responsible for two major areas of functionality:
- Provisioning: provisions resources in a cloud environment.
- Billing: reports historical and forecasted cost information.
Extending K8s
2 Primary ways to extend the Kubernetes API:
- with
CustomResourceDefinition
s - with Kubernetes API Aggregation Layer
For Option 2: run an extension API server in Pod(s) that run in your cluster. It can be used to integrate your apiserver with whatever other external systems (e.g. different storage APIs rather than etcd
). Unlike Custom Resource Definitions (CRDs), the Aggregation API involves another server - your Extension apiserver - in addition to the standard Kubernetes apiserver. https://github.com/kubernetes-sigs/apiserver-builder-alpha is one way to build api server extensions.
With CRDs
The preferred way to extend APIs.
new API = CRDs + Controllers + Webhooks + ...
= CRDs (yamls)
+ backend pod for controllers and webhooks (binary)
+ backend manifests (`Service`, `Deployment`)
+ webhook manifests (`ValidatingWebhookConfiguration` / `MutatingWebhookConfiguration`)
+ `Role`s / `RoleBinding`s / `ServiceAccount`s
+ ...
With Aggregation Layer
The aggregation layer runs in-process with the kube-apiserver.
- Setting up an extension API server: Run an extension API server in
Pod
in your cluster.- can build with
apiserver-builder
library
- can build with
- Register an API: add an
APIService
object to "claims" the URL path (/apis/xxx/v1/...
) in the Kubernetes API. - The aggregation layer will proxy anything sent to that API path to the registered
APIService
.
Infrastructure Providers
- Cloud Infrastructure Providers: AWS, Azure, and Google
- Bare-metal Infrastructure Providers: VMware, MAAS, and metal3.io.
Certificates / Trust
Client validates server cert
client-side validation of server certificates (either with a trusted Certificate Authority (CA), or an inline CA certificate in the certificate-authority-data field of the cluster section)
$ kubectl config view --minify --raw --output 'jsonpath={..cluster.certificate-authority-data}' | base64 -d > /tmp/kubectl-cacert
$ curl --cacert /tmp/kubectl-cacert $(kubectl config view --minify --output 'jsonpath={..cluster.server}')
The client-side validation of server certificates is the cause for the Unable to connect to the server: x509: certificate signed by unknown authority
error.
Server validates client cert
A client certificate is validated by the server.
Server-side apply
- server-side apply = partial (per field) apply.
- client-side apply = client get manifest (yaml), modify, apply the updated yaml.
Running Kubernetes
Cloud | On Prem | |
---|---|---|
Control Plane HA | load balancer service | haproxy + keepalived |
Service LB | load balancer service | metallb |
Node | VM Instance (EC2 or GCE) | bare metal or VMWare |
Nodes: k8s can run on
- VMs on Cloud: e.g. GKE on Google Cloud, or Anthos Multi Cloud on other clouds.
- VMs on VMWare: e.g. Anthos on VMware.
- Bare-metal: e.g. Anthos Bare-metal.
Swap
swap was not initially supported (alpha in 1.22, beta in 1.28) and is still not recommended.
Why?
- Performance: Swap's performance depends on the underlying physical storage, but in general it is worse than regular memory.
- Predictability: Having swap available on a system reduces predictability. swap changes a system's behaviour under memory pressure.
- Security: critical information like Kubernetes Secrets may be swapped out to the disk, if without encryption, it may be obtained by other users.
Kubernetes 1.28 has Beta support for using swap on Linux: it collects node-level metric statistics, which can be accessed at the /metrics/resource
and /stats/summary
kubelet HTTP endpoints.
Deprecation / Migration
- The
EndpointSlice
API is the recommended replacement forEndpoints
. Ingress
=>Gateway
kube-dns
=>coredns
- label:
k8s-app
=>app.kubernetes.io/name
- abac => rbac
- Migrate from PodSecurityPolicy to the Built-In PodSecurity Admission Controller.
CoreDNS's service is still called kube-dns
, and the label is still k8s-app
.
APIs under development
Cluster API
https://github.com/kubernetes-sigs/cluster-api
Provisioning, upgrading, and operating multiple Kubernetes clusters.
kubeadm
is built-in.
Book: https://cluster-api.sigs.k8s.io/
Gateway API
Similar to Istio's Gateway
, more powerful than K8s Ingress
.
https://github.com/kubernetes-sigs/gateway-api
Nested Virtualization
- minikube, --driver=docker, --container-runtime=docker, on your laptop
- Node: docker creates a container on your laptop as the cluster node
- Pod: docker containers running on the docker container node
- kind, same as minikube, --driver=docker, --container-runtime=containerd
- minikube, --driver=kvm2, --container-runtime=containerd, on your laptop
- Node: KVM based VMs running on your laptop.
- Pod: containers (managed by containerd) running inside the KVM based VMs.
- GKE
- Node:
- Google owns bare-metal nodes;
- VMs are created on the bare-metals (GCE);
- Pod: containers running inside the cloud VMs.
- Node:
- GCE, minikube, --driver=docker, --container-runtime=docker
- docker on docker on VM on bare-metal
Learning Resources
- Kubernetes the hard way: https://github.com/kelseyhightower/kubernetes-the-hard-way
- kubebuilder: https://book.kubebuilder.io/
- Kubernetes: The Documentary