logo

Kubernetes - Troubleshooting

Last Updated: 2024-01-20

... is attempting to grant RBAC permissions not currently held

Error:

Error from server (Forbidden): clusterroles.rbac.authorization.k8s.io "foo-cluster-role" is forbidden: user "[email protected]" (groups=["bar"]) is attempting to grant RBAC permissions not currently held:
{APIGroups:[""], Resources:["nodes"], Verbs:["list"]}

Solution: use kubectl patch to add the missing permission

$ kubectl patch clusterrole cluster-role-name \
  --kubeconfig ${KUBECONFIG} \
  --type='json' \
  -p='[{"op": "add", "path": "/rules/0", "value":{ "apiGroups": [""], "resources": ["nodes"], "verbs": ["list"]}}]'

If kubectl patch fails for the current user does not have the permission, so it cannot grant permission to this clusterrole.: Check your kubeconfig, if there's another context with higher permissions, use the context:

$ kubectl config use-context admin-context

Then patch again.

Err:28: map[DriverName:filesystem Enclosed:map[Err:28 Op:mkdir ...

Error: Err:28

map[DriverName:filesystem Enclosed:map[Err:28 Op:mkdir Path:/var/lib/registry/docker/registry/v2/repositories/<project>/<repository>]]

Root cause: not enough space.

Verification: check disk space of the harbor registry pod:

$ kubectl -n HARBOR_NAMESPACE exec HARBOR_REGISTRY_POD_NAME -- df -ah | less

Solution: resize the disk size for the registry.

# Get the pod
POD=$(kubectl get pods -n HARBOR_NAMESPACE -l goharbor.io/operator-controller=registry -o name --kubeconfig=/path/to/kubeconfig)

# Set the new size
STORAGE_SIZE=400Gi

# Patch PVC
kubectl patch Persistentvolumeclaim/harbor-registry \
    --kubeconfig=/path/to/kubeconfig \
    -n harbor-system --type=merge \
    -p '{"spec": {"resources": {"requests": {"storage": "'$STORAGE_SIZE'"}}}}'

# Wait until the storage capacity is changed
kubectl --kubeconfig=/path/to/kubeconfig -n HARBOR_NAMESPACE exec $POD -- df -ah | grep "/var/lib/registry"

Err:30

Error: Err:30

map[DriverName:filesystem Enclosed:map[Err:30 Op:mkdir Path:/var/lib/registry/docker/registry/v2/repositories/<project>/<repository>]]

Root cause: Err 30 is -EROFS, error due to writeback to read-only filesystem.

Verification:

# Get the pod.
POD=$(kubectl get pods -n HARBOR_NAMESPACE -l goharbor.io/operator-controller=registry -o name --kubeconfig=/path/to/kubeconfig)

kubectl --kubeconfig=/path/to/kubeconfig -n HARBOR_NAMESPACE exec $POD -- mount | grep /var/lib/registry

# Check if it is mounted as `ro`.

Solution: try to delete and recreate the pod and check if the volume is attached as rw.

Object stuck in Terminating Status

Check the finalizers of the object. Objects will not be removed until its metadata.finalizers field is empty.

The target object remains in a terminating state while the control plane, or other components, take the actions defined by the finalizers.

https://kubernetes.io/docs/concepts/overview/working-with-objects/finalizers/

message: 'The node was low on resource: ephemeral-storage.

Error

Pods are failing:

"message: 'The node was low on resource: ephemeral-storage."

Debug

Check disk usage

$ df -h

If the disk is indeed full, check what is taking up the disk spaces in /var/lib/kubelet or /var/log.

no kind is registered for the type ... in scheme ...

Add AddToScheme():

import (
  foov1 "path/to/foo/v1"
  runtimeutil "k8s.io/apimachinery/pkg/util/runtime"
)
runtimeutil.Must(foov1.AddToScheme(scheme))

too many open files

Check:

$ sysctl fs.inotify.max_user_instances fs.inotify.max_user_watches

# or
$ cat /proc/sys/fs/inotify/max_user_watches
$ cat /proc/sys/fs/inotify/max_user_instances

To increase temporarily:

$ sudo sysctl fs.inotify.max_user_watches=524288
$ sudo sysctl fs.inotify.max_user_instances=512

To make the changes persistent, edit the file /etc/sysctl.conf and add these lines:

fs.inotify.max_user_watches = 524288
fs.inotify.max_user_instances = 512

Count inotify watches by user

find /proc/*/fd -user "$USER" -lname anon_inode:inotify \
 -printf '%hinfo/%f\n' 2>/dev/null |
xargs cat | grep -c '^inotify'

"timed out waiting for cache to be synced"

Maybe missing CRD or RBAC.

failed to call webhook: the server could not find the requested resource

  • Check your ValidatingWebhookConfiguration CRs.
  • Check the Service of the webhook.
  • Check the Deployment of the webhook backend, see if it is up and running, and if it is busy dealing with something else.