Kubernetes - Containers
Terminologies
- Image: a read-only immutable template that defines how a container will be realized.
- Container: a runtime instance of an image.
- Pod: a collection of one or more containers; pod is the smallest unit to be deployed on Kubernetes.
- Dockerfile: a text document that contains all the commands a user could call on the command line to assemble an image.
- Containerfile: equivalent to
Dockerfile
; uses the same syntax as aDockerfile
internally. - OCI (Open Container Initiative) for low-level specs.
- CRI (Container Runtime Interface) for high-level specs.
What are Containers?
Think of "container" as just another packaging format.
Just like .iso
files for disk images, .deb
/.rpm
for linux packages, or .zip
/.tgz
for binary or arbitrary files.
The ecosystem is more than just a format, it includes:
- Image
- Distribute
- Runtime
- Orchestration
Unlike traditional virtualization, containerization takes place at the kernel level. Most modern operating system kernels now support the primitives necessary for containerization, including Linux with openvz
, vserver
and more recently lxc
.
A container image is a tar file containing tar files. Each of the tar file is a layer.
Read more: Containers vs VMs
What are Container Runtimes?
In 2020, Kubernetes deprecated Docker as a container runtime after version 1.20, in favor of runtimes that use the Container Runtime Interface (CRI): containerd
and CRI-O
. (Note that Docker is still a useful tool for building containers, and the images that result from running docker build can still run in your Kubernetes cluster.)
runC
: This is the low-level container runtime (the thing that actually creates and runs containers). It includeslibcontainer
, a native Go-based implementation for creating containers. Docker donatedrunC
to OCI.runc
(https://github.com/opencontainers/runc) is a CLI tool for spawning and running containers according to the OCI specification.
- High-level runtimes:
- containerd: CNCF graduated project, contributers: Google, Microsoft, Alibaba, etc, came from docker and made CRI compliant. Uses runC under the hood.
- CRI-O: CNCF incubating project, contributers: RedHat, IBM, Intel etc, created from the ground up for K8s.
Docker's default runtime: runC
$ docker run --runtime=runc ...
gVisor can be integrated with Docker by changing runc
to runsc
("run sandboxed container)
$ docker run --runtime=runsc ...
gVisor runs slower than default docker runtime due to the "sandboxing": https://github.com/google/gvisor/issues/102
Notable Projects
cgroups
: provides namespace isolation and abilities to limit, account and isolate resource usage (CPU, memory, disk I/O, network, etc.) of process groups.- Linux Containers (LXC): on top of
cgroups
, operating system–level virtualization technology for running multiple isolated Linux systems (containers) on a single control host. - LXD: similar to LXC, but a REST API on top of
liblxc
. - Docker: an open source Linux containerization technology. Package, distribute and runtime solution.
- Docker: application container; LXC/LXD: system container
- initially used
liblxc
but later changed tolibcontainer
containerd
: Container daemon. Docker spun out the container runtime and donated it to CNCF. Now containerd is a graduated CNCF project. Usingrunc
as runtime. Used by Docker, Kubernetes, AWS ECS, etc.- gVisor: a user-space kernel for containers. It limits the host kernel surface accessible to the application while still giving the application access to all the features it expects. It leverages existing host kernel functionality and runs as a normal user-space process. For running untrusted workloads. Lower memory and startup overhead compared to a full VM.
Docker
When we talk about Docker, usually we are referring to Docker Engine, which consists of
- the Docker daemon (
dockerd
). - a REST API that specifies interfaces for interacting with the daemon.
- a command line interface (CLI) client (
docker
) that talks to the daemon (through the REST API wrapper), e.g.docker run <image>
,docker image ls
.
Because Docker operates at the OS level, it can still be run inside a VM.
2 most important APIs: Images and Container APIs.
Docker Root Dir: e.g. /var/lib/docker/
where the images are stored.
Docker Compose vs Docker Stack
docker-compose
: a tool for defining and running multi-container Docker applications; a separate tool built in Python, internally uses the Docker API to bring up containers according to the specificationdocker stack
: built-in docker CLI, no additional packages needed; written in Go; (successor of docker-compose?)- both works with
docker-compose.yml
, howeverdocker stack
only works with version 3.
Docker for Mac
The Docker for Mac application does not use docker-machine to provision that VM; but rather creates and manages it directly.
Podman
Podman is a daemonless container engine for developing, managing, and running OCI Containers on your Linux System. Containers can either be run as root or in rootless mode. Use by kind. Developed by RedHat.
Podman is based on libpod
, a library for container lifecycle management.
podman vs docker
The most significant difference between Docker and Podman is that Docker uses a client-server architecture, where a daemon runs on every host that needs to run containers, whereas Podman uses a single-process architecture. Because of this, pods and images are smaller. Also, because Podman is a single process, it can avoid the security issues related to the multi-process architecture, such as sharing PID namespace with all other containers.
Podman allows for non-root privileges for containers, where issues can be addressed quickly in a safe way.
The docker daemon runs as root by default, effectively giving an attacker root access to your machine.
Podman runs as a regular user and does not require root privileges.
Who's Not Using Containers?
Well it is gaining momentum and popularity. Many companies are adopting it.
Two notable exceptions are: Google and Facebook
Google has its own packaging format: MPM. MPM on Borg is similar to container on Kubernetes, and Kubernetes is the open-source version of Borg.
Facebook use Tupperware. Why not docker? They didn't exist then.