logo

Kubernetes - Storage

Last Updated: 2023-01-08

2 components

There are two components to storage.

  • system storage: stored locally, on the control plane nodes (e.g., etcd, keys, certificates) and on the worker nodes (e.g., logs, metrics).
    • Etcd: fault-tolerance can be achieved either through master replication (i.e., running multiple masters, each using non-fault-tolerant (local) storage) or by a single master writing to / reading from fault-tolerant storage.
    • Keys and certificates, Audit logs: require encryption and restricted mutability.
    • System logs (e.g. Fluentd) metrics (e.g. Prometheus): may not require fault tolerant storage as they are usually exported to Cloud and typically need storage for local buffering only (e.g., to cover up to 24h of network unavailability).
  • application storage: requires CSI drivers for customer-provided external storage. Options:
    • use pre-existing fault-tolerant on-prem storage solutions like NetApp or EMC
    • use a storage solution on top of a K8s cluster.
      • fault-tolerant K8s-managed storage: e.g. Ceph, EdgeFS, etc.
      • non-fault-tolerant: e.g. Persistent Local Volumes.

Access mode

  • ReadWriteOnce (RWO) – only one node is allowed to access the storage volume at a time for read and write access. RWO is supported by all PVs.
  • ReadOnlyMany (ROX) – many nodes may access the storage volume in read-only mode. ROX is supported primarily by file and file-like protocols, e.g. NFS and CephFS. However, some block protocols are supported, such as iSCSI
  • ReadWriteMany (RWX) – many nodes may simultaneously read and write to the storage volume. RWX is supported by file and file-like protocols only, such as NFS.
  • ReadWriteOncePod (RWOP) - the volume can be mounted as read-write by a single Pod.

https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes

An SSD or block storage device can only be mounted by a single VM instance so it would be ReadWriteOnce (only one device can read/write to it).

File-based volumes (or file shares like EFS & FSX) have the ability for numerous (Many) resources to connect to them and read/write data to that drive at the same time. a file storage mount, say a NFS/SAMBA share, could be mounted to multiple virtual machines at the same time.

Key requirements of K8s system storage

  • fault tolerance (persisted state must be durable) and
  • bootstrapping (storage must be available even before the cluster control plane is fully operational)

Ephemeral storage vs Persistent storage

Ephemeral storage:

  • Standard Kubernetes volume primitives: emptyDir, secret, configMap, downwardAPI, etc.
  • Backed by local disks
  • Manage sharing via Pod ephemeral-storage requests/limits, node allocatable

Persistent storage:

  • Standard Kubernetes persistent volume primitives: PersistentVolumeClaim, PersistentVolume, StorageClass, VolumeSnapshot (requires CSI driver support).

Application storage

Examples:

  • k8s -> Trident -> ONTAP
  • k8s -> Rook -> Ceph

Cloud big 3:

  • Amazon EBS
  • Google Persistent Disk
  • Azure Disk Storage

Commercial:

  • NetApp Trident
  • Red Hat Container Storage Platform
  • MayaData Kubera
  • Portworx
  • Robin
  • StorageOS
  • Diamanti

Traditional Storage Vendors:

  • NetApp
  • Dell EMC
  • Pure Storage
  • HPE Storage

Open Source Projects

  • Ceph
  • LongHorn
  • OpenEBS
  • Rook

Backend technology or protocols

  • iSCSI
  • NFS

CSI

Using CSI, third-party storage providers can write and deploy plugins exposing new storage systems in Kubernetes without ever having to touch the core Kubernetes code.

CSI is a spec. a standard for exposing arbitrary block and file storage storage systems to containerized workloads on Container Orchestration Systems (COs) like Kubernetes. k8s has its own CSI implementation.

CSI driver: as provisioner in StorageClass; PVC reference the StorageClass in spec by storageClassName

kind: StorageClass
provisioner: csi-driver.example.com

Pod to PVC:

kind: Pod
spec:
  volumes:
    - name: foo
      persistentVolumeClaim:
        claimName: my-request-for-storage

Where is CSI called

  • Kubelet directly issues CSI calls (like NodeStageVolume, NodePublishVolume, etc.) to CSI drivers via a Unix Domain Socket to mount and unmount volumes.
  • Kubelet discovers CSI drivers (and the Unix Domain Socket to use to interact with a CSI driver) via the kubelet plugin registration mechanism.
  • Kubernetes master components do not communicate directly (via a Unix Domain Socket or otherwise) with CSI drivers. Kubernetes master components interact only with the Kubernetes API.

HostPath

A host path volume mounts a file or directory from the file system of the host node into your pod.

NetApp

NetApp Harvest: The default package collects performance, capacity and hardware metrics from ONTAP clusters. https://github.com/NetApp/harvest

NetApp Trident

https://github.com/NetApp/trident

Trident is an external provisioner controller:

  • run as a k8s pod or deployment;provides dynamic storage orchestration services for your Kubernetes workloads.
  • monitors activities on PVC / PV / StorageClass
  • a single provisioner for different storage platforms (ONTAP and others)
  • Trident CSI driver talks to ONTAP REST API

Trident interacts with k8s (from Trident official doc)

  • A user creates a PersistentVolumeClaim requesting a new PersistentVolume of a particular size from a Kubernetes StorageClass that was previously configured by the administrator.
  • The Kubernetes StorageClass identifies Trident as its provisioner and includes parameters that tell Trident how to provision a volume for the requested class.
  • Trident looks at its own Trident StorageClass with the same name that identifies the matching Backends and StoragePools that it can use to provision volumes for the class.
  • Trident provisions storage on a matching backend and creates two objects: a PersistentVolume in Kubernetes that tells Kubernetes how to find, mount and treat the volume, and a Volume in Trident that retains the relationship between the PersistentVolume and the actual storage.
  • Kubernetes binds the PersistentVolumeClaim to the new PersistentVolume. Pods that include the PersistentVolumeClaim will mount that PersistentVolume on any host that it runs on.

Ceph

  • as containerized workload: replication x3, ha,
  • as non-containerized: directly run on the machine, deployed as part of systemd.

storage

apiVersion: storage.k8s.io/v1
kind: CSIDriver

apiVersion: ceph.rook.io/v1
kind: CephCluster

An external cluster is a Ceph configuration that is managed outside of the local K8s cluster. The external cluster could be managed by cephadm

Components:

  • Ceph's foundation is a low-level data store named RADOS that provides a common backend for multiple user-consumable services.
  • Object Storage Devices (OSDs). An OSD in RADOS is always a folder within an existing filesystem. All OSDs together form the object store proper, and the binary objects that RADOS generates from the files to be stored reside in the store. The hierarchy within the OSDs is flat: files with UUID-style names but no subfolders.
  • Monitoring servers (MONs): MONs form the interface to the RADOS store and support access to the objects within the store. They handle communication with all external applications and work in a decentralized way. ceph-osd is the object storage daemon.

Check Status:

$ ceph -s