Kubernetes
Kubernetes is winning the container orchestration war.
Orchestration tools: handle containers running stateless applications. The applications may be terminated at any time, and / or restarted from a different machine. (which means production db should not run in containers.)
- Kuberenetes
- Mesos
- Nomad
Kubernetes: The Documentary
- Part 1: https://www.youtube.com/watch?v=BE77h7dmoQU
- Part 2: https://www.youtube.com/watch?v=318elIq37PE
Kubernetes Applications
- Stateless applications: trivial to scale, with no coordination. These can take advantage of Kubernetes deployments directly and work great behind Kubernetes Services or Ingress Services.
- Stateful applications: postgres, mysql, etc which generally exist as single processes and persist to disks. These systems generally should be pinned to a single machine and use a single Kubernetes persistent disk. These systems can be served by static configuration of pods, persistent disks, etc or utilize StatefulSets.
- Static distributed applications: zookeeper, cassandra, etc which are hard to reconfigure at runtime but do replicate data around for data safety. These systems have configuration files that are hard to update consistently and are well-served by StatefulSets.
- Clustered applications: etcd, redis, prometheus, vitess, rethinkdb, etc are built for dynamic reconfiguration and modern infrastructure where things are often changing. They have APIs to reconfigure members in the cluster and just need glue to be operated natively seemlessly on Kubernetes, and thus the Kubernetes Operator concept
Kubernetes vs OpenStack
Openstack was launched in 2010. AWS was the only Cloud, GCP didn't exist, Docker was not a thing. The goal was to provide an open source and private alternative to AWS; building on top of VMs.
Kubernetees was launched in 2014. AWS, Azure, GCP became dominant players of Cloud computing, Docker became the synonym of container. The goal was to be a bridge among the big 3, and between public cloud and private data centers; building on top of containers.
OpenStack is on the downtrend.
Kubernetes vs Nomad
- Kubernetes aims to provide all the features needed to run Linux container-based applications including cluster management, scheduling, service discovery, monitoring, secrets management and more.
- Nomad only aims to focus on cluster management and scheduling and is designed with the Unix philosophy of having a small scope while composing with tools like Consul for service discovery/service mesh and Vault for secret management.
Summary
2 key concepts: resources + controllers
each resource has a controller monitoring it; except that ConfigMap just store stuff, no controllers (ConfigMap does not spec
and status
field, but in data
)
e.g. deployment controller watches deployment resources.
Hierarchy: cluster -> namespace -> node -> pod -> container
X as Code
In kubernetes, everything is a code. Git repository which should act as single source of truth.
Naming
The K8S group has a tradition of using Greek names
- Kubernetes (κυβερνήτης): helmsman or pilot
- Istio (ιστίο): sail
- Anthos (ἄνθος): flower
K8s Native
Native = using KRM apis
Standards
- Container Runtime Interface (CRI):the main protocol for the communication between the kubelet and Container Runtime.
- Container Storage Interface (CSI)
- Container Network Interface (CNI)
Binaries
On the Control Plane
kube-apiserver
: API Serverkube-controller-manager
: Controller Managerkube-scheduler
: Scheduleretcd
cloud-controller-manager
(Optional)
On Worker Nodes (virtual or physical machines, managed by the control plane and contains the services necessary to run Pods.)
kubelet
: Talks to API Server.kube-proxy
- Container Runtime: e.g.
containerd
, a daemon on worker nodes. Manages the container lifecycle.
Containerized or not
- Containerized (can be found in
kubectl get service -A
):kube-apiserver
,kube-scheduler
,kube-proxy
,etcd
, etc. - Not containerized (run as
systemd
services):kubelet
,containerd
,docker
.
API Server
API Server clients: CLI (kubectl), CI/CD (Jenkins), Dashboard / UI, kubelet, control plane components (controller-manager, scheduler, etc)
- clients wihin Control Plane: controllers, scheduler, etcd.
- between API Server and developers:
kubectl
,kubeadm
, REST API, client libraries (https://github.com/kubernetes-client) - between API Server and Nodes:
kubelet
Access management:
authentication -> authorization -> admission control ("mutating" / "validating" admission controllerss)
the API server implements a push-based notification stream of state changes (events), also known as Watch
One of the reasons why watches are so efficient is because they’re implemented via the gRPC streaming APIs.
Scheduler
The scheduler is a kind of controller. why separate from controler manager? big enough; easy to use an alternative scheduler
Specs
- Container Runtime Interface (CRI):
- CNI
Namespace
- Namespace-based scoping is applicable only for namespaced objects (e.g. Deployments, Services, pods, services, replication controllers, etc) and not for cluster-wide objects (e.g. StorageClass, Nodes, PersistentVolumes, etc). namespace resources are not themselves in a namespace. To get all the namespaces:
kubectl get namespace
Controllers
controllers (pieces of Go code) live in a controller-manager (a binary / container)
Controller pattern: Controllers typically read an object's .spec
, possibly do things, and then update the object's .status
controllers are clients that call into the API server (i.e. API server does not know who or where the controllers are, they are NOT registered, unlike webhooks)
e.g. Job controller, tells API server to create or remove Pods. Other examples: replication controller, endpoints controller, namespace controller, and serviceaccounts controller. built-in controllers that run inside the kube-controller-manager.
Configuration Management
- Helm: templates + values => yaml, good for yamls you fully own
- kustomize: literal yaml + patches (does not use templates) good for yamls you do not own
Life of a deployment
- user submit a
deployment.yaml
to API Server - deployment.yaml is stored in etcd; only API Server can access etcd
- controller-manager sees the deloyment.yaml and create corresponding pods
- scheduler: assigns a pod to a node.
- kubelet talks to the API Server and read the schedule, runs the pods
- end users calls the running pods through kube-proxy (kube-proxy calls API Server to get services)
Webhook
Webhooks may run as containers in k8s; webhooks can be used to extend admission control. e.g. istio / linkerd has registered admission hooks: user submits normal yaml configs, and "Mutating Admission" stage will add the sidecar container to it.
There are 3 kinds of webhooks:
- admission webhook. 2 types of admission webhook: mutating and validating admission webhook.
- authorization webhook
- CRD conversion webhook
Webhook vs Binary Plugin:
- Webhook model: Kubernetes makes a network request to a remote service.
- Binary Plugin model: Kubernetes executes a binary (program). Binary plugins are used by the
kubelet
and bykubectl
.
APIs
API Conventions: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md
- core group:
- REST Path:
/api/v1
apiVersion
: "core" is skipped, i.e.apiVersion: v1
.
- REST Path:
- exntions:
- REST Path:
/apis/$GROUP_NAME/$VERSION
apiVersion
:$GROUP_NAME/$VERSION
, e.g.apiVersion: batch/v1
- REST Path:
The /api
endpoint is already legacy and used only for core resources (pods, secrets, configmaps, etc.). A more modern and generic /apis/<group-name>
endpoint is used for the rest of resources, including user-defined custom resources.
Add-ons
- add-on examples: CoreDNS, Dashboard, etc.
- add-ons can be in the form of Kubernetes Operators.
- add-ons can be installed by helm.
Cert Manager
cert-manager
adds certificates and certificate issuers as resource types in Kubernetes clusters, and simplifies the process of obtaining, renewing and using those certificates.
https://github.com/cert-manager/cert-manager
Service Accounts
- each pod is assigned a
ServiceAccount
by default. A default secret token is mounted on every pod's file system. - each pod gets a
Secret
volume automatically mounted.
Learning Resources
- Kubernetes the hard way: https://github.com/kelseyhightower/kubernetes-the-hard-way
- kubebuilder: https://book.kubebuilder.io/
SIGS
Cluster API
https://github.com/kubernetes-sigs/cluster-api
Provisioning, upgrading, and operating multiple Kubernetes clusters.
kubeadm
is built-in.
Book: https://cluster-api.sigs.k8s.io/
Admin clusters
- provides a Kubernetes Resource Model (KRM) API for managing the lifecycle of multiple user clusters.
- provides a single location to store/cache common policies for multiple user clusters.
In ABM, admin clusters run on-premises. To support edge deployments with limited resource footprints, the admin cluster can run remotely in a different datacenter or region, or a public cloud.
Upgrade
Use kubeadm upgrade
to upgrade. The upgrade procedure on control plane nodes and worker nodes should be executed one node at a time.
If kubeadm upgrade
fails and does not roll back, for example because of an unexpected shutdown during execution, you can run kubeadm upgrade again. This command is idempotent and eventually makes sure that the actual state is the desired state you declare.
You can use kubectl drain
to safely evict all of your pods from a node before you perform maintenance on the node (e.g. kernel upgrade, hardware maintenance, etc.). Alternatively can call eviction API.
To undrain:
$ kubectl uncordon <node name>
In Tree
"in-tree" meaning their code was part of the core Kubernetes code and shipped with the core Kubernetes binaries.
Access Control
- Namespaces: segment pods by application or work group, support multi-tenancy.
- RBAC: assign roles to users for specific namespaces.
Bootstrapping
Bootstrap the Bootstrapper
Use a machine (or VM) as the bootstrapper, install OS and necessary tools.
Kind Cluster
- Spin up a Kind cluster
- Kind cluster - install Cluster API and other controllers
- Kind cluster - create admin cluster
- Pivot the cluster lifecycle resources into admin cluster
Pivot
Pivoting: moving objects from the ephemeral k8s cluster (the Kind cluster) to a target cluster (the newly created admin cluster).
The process:
- Pause any reconciliation of objects.
- Once all the objects are paused, the objects are created on the other side on the target cluster and deleted from the ephemeral cluster.
Delete Kind cluster.
https://cluster-api.sigs.k8s.io/clusterctl/commands/move.html#bootstrap--pivot
Create User Clusters (If Multi-tenency)
- Admin cluster - create user clusters