Skip to content

Commit

Permalink
WIP: pkg/agent: wait for all volumes to be detached before rebooting
Browse files Browse the repository at this point in the history
This commit provides PoC version of implementing agent waiting for all
volumtes attached to the node to be detached as a step after draining
the node, as shutting down the Pod does not mean the volume has been
detached, as usually CSI agent will be running as a DaemonSet on the
node and will take care of detaching the volume from the node when the
pod shuts down.

This commit improves rebooting experience, as right now if there is not
enough time for CSI agent to detach the volumes from the node, node gets
rebooted and pods using attached volumes have no way to be attached to
other nodes, which effectively increases the downtime caused for
stateful workloads.

This commit still requires tests and better interface for the users.

If someone wants to try this feature on their own cluster, I've
published the following image I've been testing with:

quay.io/invidian/flatcar-linux-update-operator:97c0dee50c807dbba7d2debc59b369f84002797e

Closes #30

Signed-off-by: Mateusz Gozdek <[email protected]>
  • Loading branch information
invidian committed Jan 11, 2023
1 parent f45ff7c commit 88957e7
Show file tree
Hide file tree
Showing 2 changed files with 30 additions and 0 deletions.
6 changes: 6 additions & 0 deletions examples/deploy/rbac/cluster-role.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -47,3 +47,9 @@ rules:
- daemonsets
verbs:
- get
- apiGroups:
- storage.k8s.io
resources:
- volumeattachments
verbs:
- list
24 changes: 24 additions & 0 deletions pkg/agent/agent.go
Original file line number Diff line number Diff line change
Expand Up @@ -290,6 +290,30 @@ func (k *klocksmith) process(ctx context.Context) error {

klog.Info("Node drained, rebooting")

for {
attachments, err := k.clientset.StorageV1().VolumeAttachments().List(ctx, metav1.ListOptions{})
if err != nil {
klog.Errorf("Listing volume attachments: %v", err)
continue
}

anyVolumeAttached := false

for _, attachment := range attachments.Items {
if attachment.Status.Attached && attachment.Spec.NodeName == k.nodeName {
anyVolumeAttached = true
klog.Infof("Volume %q is still attached, waiting for detach", attachment.Name)
}
}

if !anyVolumeAttached {
klog.Info("All volumes are detached from node, rebooting.")
break
}

time.Sleep(5 * time.Second)
}

// Reboot.
k.lc.Reboot(false)

Expand Down

0 comments on commit 88957e7

Please sign in to comment.