Containers

Updated: 2020-03-22

What is Container?

Think of "container" as just another packaging format.

Just like .iso files for disk images, .deb/.rpm for linux packages, or .zip/.tgz for binary or arbitrary files.

The ecosystem is more than just a format, it includes:

  • Image(package)
  • Distribute
  • Runtime
  • Orchestration

Unlike traditional virtualization, containerization takes place at the kernel level. Most modern operating system kernels now support the primitives necessary for containerization, including Linux with openvz, vserver and more recently lxc.

VM vs (Traditional) Container vs Sandboxed Container

VM: On top of Hypervisor, and each VM has its own guest OS

     VM1        VM2
|----------|----------|
|   App    |   App    |
|==========|==========| => System Calls
|  Guest   |  Guest   |
|  Kernel  |  Kernel  |
|----------|----------|
| Virtual  | Virtual  |
| Hardware | Hardware |
|----------|----------|
|   Hypervisor(VMM)   |
|=====================| => System Calls
|     Host Kernel     |
|---------------------|
|    Host Hardware    |
|---------------------|

Traditional Container(e.g. Docker, LXC): Operating system level virtualization. The kernel imposes limits on resources, implemented through use of cgroups and namespaces. Share the host OS kernel.

|----------|----------|
|   App    |   App    |
|----------|----------|
|   Container Layer   |
|---------------------|
|     Host Kernel     |
|---------------------|
|    Host Hardware    |
|---------------------|

Sandboxed Container(e.g. Google gVisor, Amazon Firecracker, IBM Nabla): provides a user-space kernel

|----------|----------|
|   App    |   App    |
|==========|==========| => System Calls
|        gVisor       |
|=====================| => Limited System Calls
|     Host Kernel     |
|---------------------|
|    Host Hardware    |
|---------------------|

Read more about sandboxed containers

OCI: Open Container Initiative

https://www.opencontainers.org/

Defines 2 important specs, so different tools can be used to pack/unpack and run by different runtimes:

runc (https://github.com/opencontainers/runc) is a CLI tool for spawning and running containers according to the OCI specification.

Notable Projects

  • Docker: an open source Linux containerization technology. Package, distribute and runtime solution.
  • containerd: Container daemon. Docker spun out the container runtime and donated it to CNCF. Now containerd is a graduated CNCF project. Using runc as runtime. Used by Docker, Kubernetes, AWS ECS, etc.
  • cgroup: limits and isolates resources(CPU, memory, disk I/O, network, etc)
  • lxc(linuxcontainer)
  • gVisor: a user-space kernel for containers. It limits the host kernel surface accessible to the application while still giving the application access to all the features it expects. It leverages existing host kernel functionality and runs as a normal user-space process. For running untrusted workloads. Lower memory and startup overhead compared to a full VM.

Runtime

Docker's default runtime: runC

$ docker run --runtime=runc ...

gVisor can be integrated with Docker by changing runc to runsc("run sandboxed container)

$ docker run --runtime=runsc ...

gVisor runs slower than default docker runtime due to the "sandboxing": https://github.com/google/gvisor/issues/102

Orchestration

  • Kuberenetes
  • Mesos
  • Nomad

LXC vs LXD vs cgroups vs Docker

  • Linux Containers (LXC): on top of cgroups, operating system–level virtualization technology for running multiple isolated Linux systems (containers) on a single control host (CoreOS instance).
  • cgroups: provides namespace isolation and abilities to limit, account and isolate resource usage (CPU, memory, disk I/O, etc.) of process groups
  • LXD: similar to LXC, but a REST API on top of liblxc
  • Docker: application container; LXC/LXD: system container; Docker initially used liblxc but later changed to libcontainer

Who's Not Using Containers?

Well it is gaining momentum and popularity. Many companies are adopting it.

Two notable exceptions are: Google and Facebook

Google has its own packaging format: MPM. MPM on Borg is similar to container on Kubernetes, and Kubernetes is the open-source version of Borg.

Facebook use Tupperware. Why not docker? They didn't exist then.

Tupperware resources: