Skip to content

Latest commit

 

History

History
198 lines (163 loc) · 7 KB

topolvm-controller.md

File metadata and controls

198 lines (163 loc) · 7 KB

topolvm-controller

topolvm-controller provides a CSI controller service. It also works as a custom Kubernetes controller for additional tasks.

CSI Controller Features

topolvm-controller implements following optional features:

Webhooks

topolvm-controller implements two webhooks:

/pod/mutate

Mutate new Pods to add capacity.topolvm.io/<device-class> annotations to the pod and topolvm.io/capacity resource request to its first container. These annotations and the resource request will be used by topolvm-scheduler to filter and score Nodes.

This hook handles two classes of pods. First, pods having at least one unbound PersistentVolumeClaim (PVC) for TopoLVM and no bound PVC for TopoLVM. Second, pods which have at least one generic ephemeral volume which specify using the StorageClass of TopoLVM.

For both PVCs and generic ephemeral volumes, the requested storage size for the volume is calculated as follows:

  • if the volume has no storage request, the size will be treated as 1 GiB.
  • if the volume has storage request, the size is as is.

The value of the resource request is the sum of storage sizes of unbound PVCs for TopoLVM.

The following manifest exemplifies usage of TopoLVM PVCs:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: topolvm
provisioner: topolvm.io            # topolvm-scheduler works only for StorageClass with this provisioner.
parameters:
  "csi.storage.k8s.io/fstype": "xfs"
  "topolvm.io/device-class": "ssd"
volumeBindingMode: WaitForFirstConsumer
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: local-pvc1
  namespace: hook-test
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: topolvm                # reference the above StorageClass
---
apiVersion: v1
kind: Pod
metadata:
  name: pause
  namespace: hook-test
  labels:
    app.kubernetes.io/name: pause
spec:
  containers:
  - name: pause
    image: registry.k8s.io/pause
    volumeMounts:
    - mountPath: /test1
      name: my-volume1
  volumes:
  - name: my-volume1
    persistentVolumeClaim:
      claimName: local-pvc1                # have the above PVC

The hook inserts capacity.topolvm.io/<device-class> to the annotations and topolvm.io/capacity to the first container as follows:

metadata:
  annotations:
    capacity.topolvm.io/ssd: "1073741824"
spec:
  containers:
  - name: pause
    resources:
      limits:
        topolvm.io/capacity: "1"
      requests:
        topolvm.io/capacity: "1"

If the specified StorageClass does not have topolvm.io/device-class parameter, it will be annotated with capacity.topolvm.io/00default.

Below is an example for TopoLVM generic ephemeral volumes:

apiVersion: v1
kind: Pod
metadata:
  name: pause
  labels:
    app.kubernetes.io/name: pause
spec:
  containers:
  - name: pause
    image: registry.k8s.io/pause
    volumeMounts:
    - mountPath: /test1
      name: my-volume
  volumes:
  - name: my-volume
      ephemeral:
        volumeClaimTemplate:
          spec:
            accessModes:
            - ReadWriteOnce
            resources:
              requests:
                storage: 1Gi
            storageClassName: topolvm # reference the above StorageClass

The hook inserts capacity.topolvm.io/<device-class> to the annotations and topolvm.io/capacity to the first container as follows:

metadata:
  annotations:
    capacity.topolvm.io/ssd: "1073741824"
spec:
  containers:
  - name: ubuntu
    resources:
      limits:
        topolvm.io/capacity: "1"
      requests:
        topolvm.io/capacity: "1"

/pvc/mutate

Mutate new PVCs to add topolvm.io/pvc finalizer. This finalizer is required to delete a pod in the following scenario.

  1. StatefulSet pod is deleted by kubectl drain. PVC is remained.
  2. A pod is recreated by the StatefulSet controller but not scheduled for some reasons.
  3. Delete a node resource on which the pod was running.
  4. PVC related to the node is deleted by the TopoLVM controller.

At step 4, the StatefulSet pod is not deleted if the PVC finalizer does not exist.

Controllers for Kubernetes Objects

The Controller for Nodes

The controller is to cleanup all PVCs and LogicalVolumes associating to the deleting Node.

It adds the topolvm.io/node finalizer to run the cleanup task.

This node finalize procedure may be skipped with the --skip-node-finalize flag. When this is true, the PVCs and the LogicalVolume CRs from a deleted node must be deleted manually by a cluster administrator.

The Controller for PersistentVolumeClams

The controller accomplishes two tasks.

  1. Delete Pods using PVCs under deletion. When a PVC for TopoLVM is being deleted, the controller deletes pods referencing the PVC, if any. This is repeated until other finalizers to be completed. Once it becomes the last finalizer, it removes the finalizer to immediately delete the PVC.

  2. Speed up resizing a PVC filesystem by nudging the kubelet. kubelet watches Pods rather than PVCs periodically to resize the filesystem, therefore the filesystem resizing may be delayed. To avoid this, the controller will notify kubelet by setting the topolvm.io/last-resizefs-requested-at annotation with the current time to the Pod.

Command-line flags

Name Type Default Description
cert-dir string /tmp/k8s-webhook-server/serving-certs Directory for tls.crt and tls.key files.
csi-socket string /run/topolvm/csi-topolvm.sock UNIX domain socket of topolvm-controller.
metrics-bind-address string :8080 Listen address for Prometheus metrics.
secure-metrics-server bool false Secures the metrics server.
leader-election-id string topolvm ID for leader election by controller-runtime.
webhook-addr string :9443 Listen address for the webhook endpoint.
skip-node-finalize bool false When true, skips automatic cleanup of PhysicalVolumeClaims on Node deletion.