Still connecting to unix:///var/lib/kubelet/csi-plugins/*.csi.alibabacloud.com/csi.sock #1127

lliiang · 2024-08-06T08:58:31Z

What happened:

集群上其中两个节点一直csi-plugin-h4qhz 报错重启，以下是日志截图

以下是container日志
csi-plugin-h4qhz-nas-driver-registrar.log
csi-plugin-h4qhz-disk-driver-registrar.log

csi-plugin-h4qhz-csi-plugin.log
csi-plugin-h4qhz-oss-driver-registrar.log

What you expected to happen:

集群有十几个节点，就其中两个节点报错,下面是DaemonSet的yaml
kind: DaemonSet apiVersion: apps/v1 metadata: name: csi-plugin namespace: kube-system uid: 509d3cfc-0dbe-4ebd-8d79-3b8c52774d17 resourceVersion: '601102482' generation: 5 creationTimestamp: '2023-03-21T14:45:10Z' annotations: deprecated.daemonset.template.generation: '5' spec: selector: matchLabels: app: csi-plugin template: metadata: creationTimestamp: null labels: app: csi-plugin annotations: kubectl.kubernetes.io/restartedAt: '2024-06-19T22:22:37+08:00' spec: nodeSelector: kubernetes.io/os: linux restartPolicy: Always serviceAccountName: csi-admin hostPID: true schedulerName: default-scheduler hostNetwork: true affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: type operator: NotIn values: - virtual-kubelet terminationGracePeriodSeconds: 30 securityContext: {} containers: - name: disk-driver-registrar image: >- registry-cn-hangzhou.ack.aliyuncs.com/acs/csi-node-driver-registrar:v2.3.1-038aeb6-aliyun args: - '--v=5' - >- --csi-address=/var/lib/kubelet/csi-plugins/diskplugin.csi.alibabacloud.com/csi.sock - >- --kubelet-registration-path=/var/lib/kubelet/csi-plugins/diskplugin.csi.alibabacloud.com/csi.sock resources: limits: cpu: 500m memory: 1Gi requests: cpu: 10m memory: 16Mi volumeMounts: - name: kubelet-dir mountPath: /var/lib/kubelet - name: registration-dir mountPath: /registration terminationMessagePath: /dev/termination-log terminationMessagePolicy: File imagePullPolicy: IfNotPresent - name: nas-driver-registrar image: >- registry-cn-hangzhou.ack.aliyuncs.com/acs/csi-node-driver-registrar:v2.3.1-038aeb6-aliyun args: - '--v=5' - >- --csi-address=/var/lib/kubelet/csi-plugins/nasplugin.csi.alibabacloud.com/csi.sock - >- --kubelet-registration-path=/var/lib/kubelet/csi-plugins/nasplugin.csi.alibabacloud.com/csi.sock resources: limits: cpu: 500m memory: 1Gi requests: cpu: 10m memory: 16Mi volumeMounts: - name: kubelet-dir mountPath: /var/lib/kubelet/ - name: registration-dir mountPath: /registration terminationMessagePath: /dev/termination-log terminationMessagePolicy: File imagePullPolicy: IfNotPresent - name: oss-driver-registrar image: >- registry-cn-hangzhou.ack.aliyuncs.com/acs/csi-node-driver-registrar:v2.3.1-038aeb6-aliyun args: - '--v=5' - >- --csi-address=/var/lib/kubelet/csi-plugins/ossplugin.csi.alibabacloud.com/csi.sock - >- --kubelet-registration-path=/var/lib/kubelet/csi-plugins/ossplugin.csi.alibabacloud.com/csi.sock resources: limits: cpu: 500m memory: 1Gi requests: cpu: 10m memory: 16Mi volumeMounts: - name: kubelet-dir mountPath: /var/lib/kubelet/ - name: registration-dir mountPath: /registration terminationMessagePath: /dev/termination-log terminationMessagePolicy: File imagePullPolicy: IfNotPresent - resources: limits: cpu: 500m memory: 1Gi requests: cpu: 100m memory: 128Mi readinessProbe: httpGet: path: /healthz port: healthz scheme: HTTP initialDelaySeconds: 10 timeoutSeconds: 5 periodSeconds: 30 successThreshold: 1 failureThreshold: 5 terminationMessagePath: /dev/termination-log name: csi-plugin livenessProbe: httpGet: path: /healthz port: healthz scheme: HTTP initialDelaySeconds: 10 timeoutSeconds: 5 periodSeconds: 30 successThreshold: 1 failureThreshold: 5 env: - name: KUBE_NODE_NAME valueFrom: fieldRef: apiVersion: v1 fieldPath: spec.nodeName - name: CSI_ENDPOINT value: >- unix://var/lib/kubelet/csi-plugins/driverplugin.csi.alibabacloud.com-replace/csi.sock - name: MAX_VOLUMES_PERNODE value: '15' - name: SERVICE_TYPE value: plugin - name: ACCESS_KEY_ID value: LTAI5t6KKbiyequnsVeJHY55 - name: ACCESS_KEY_SECRET value: S6UvK6rIVheVO4Y4fAiyVl2PZXNRMs securityContext: privileged: true allowPrivilegeEscalation: true ports: - name: healthz hostPort: 11260 containerPort: 11260 protocol: TCP imagePullPolicy: IfNotPresent volumeMounts: - name: kubelet-dir mountPath: /var/lib/kubelet/ mountPropagation: Bidirectional - name: etc mountPath: /host/etc - name: host-log mountPath: /var/log/ - name: ossconnectordir mountPath: /host/usr/ - name: container-dir mountPath: /var/lib/container mountPropagation: Bidirectional - name: host-dev mountPath: /dev mountPropagation: HostToContainer - name: addon-token readOnly: true mountPath: /var/addon - name: fuse-metrics-dir mountPath: /host/var/run/ terminationMessagePolicy: File image: >- registry-cn-hangzhou.ack.aliyuncs.com/acs/csi-plugin:v1.24.9-74f8490-aliyun args: - '--endpoint=$(CSI_ENDPOINT)' - '--v=2' - '--driver=oss,nas,disk' serviceAccount: csi-admin volumes: - name: fuse-metrics-dir hostPath: path: /var/run/ type: DirectoryOrCreate - name: registration-dir hostPath: path: /var/lib/kubelet/plugins_registry type: DirectoryOrCreate - name: container-dir hostPath: path: /var/lib/container type: DirectoryOrCreate - name: kubelet-dir hostPath: path: /var/lib/kubelet type: Directory - name: host-dev hostPath: path: /dev type: '' - name: host-log hostPath: path: /var/log/ type: '' - name: etc hostPath: path: /etc type: '' - name: ossconnectordir hostPath: path: /usr/ type: '' - name: addon-token secret: secretName: addon.csi.token items: - key: addon.token.config path: token-config defaultMode: 420 optional: true dnsPolicy: ClusterFirst tolerations: - operator: Exists priorityClassName: system-node-critical updateStrategy: type: RollingUpdate rollingUpdate: maxUnavailable: 20% maxSurge: 0 revisionHistoryLimit: 10 status: currentNumberScheduled: 15 numberMisscheduled: 0 desiredNumberScheduled: 15 numberReady: 13 observedGeneration: 5 updatedNumberScheduled: 15 numberAvailable: 13 numberUnavailable: 2

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

CSI driver version (image tag of csi-plugin container):
Deployment method (where you got the YAML files, what modifications you made, etc.):
Kubernetes version (use kubectl version):
k8s 1.26
Cloud provider or hardware configuration (e.g. Alibaba Cloud ECS instance type):
集群节点使用的是阿里云ecs
OS (e.g: cat /etc/os-release):
Kernel (e.g. uname -a):
Network plugin and version (if this is a network-related bug):
Others:

The text was updated successfully, but these errors were encountered:

huww98 · 2024-08-07T02:00:03Z

Why is your filesystem read-only? Is it intentional? What OS are you using?

lliiang · 2024-08-07T05:47:49Z

Why is your filesystem read-only? Is it intentional? What OS are you using?

my cluster is openshift 4.13

the node os is coreos

Comparing logs between normal pods and abnormal pods.

huww98 · 2024-08-07T06:02:00Z

OK, maybe we should never write file into /usr, which is expected to be managed by OS package manager.

You can try set env DISABLE_CSIPLUGIN_CONNECTOR=true. Or upgrade CSI, we have limited the number of retries to 5.

Comparing logs between normal pods and abnormal pods.

I think these logs come from different CSI version.

lliiang · 2024-08-08T07:00:34Z

hello, does csi-plugin has debug log config? how to open debug log,i want to collect debug log to platform

huww98 · 2024-08-08T07:07:18Z

No. The default log level already outputs almost all the logs.

huww98 · 2024-08-08T07:12:10Z

OK, maybe we should never write file into /usr, which is expected to be managed by OS package manager.

We decided not to fix this one. Because we have planned to remove the connector all together in the future.

k8s-triage-robot · 2024-11-06T07:51:58Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-12-06T08:41:57Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

lliiang added the kind/bug Categorizes issue or PR as related to a bug. label Aug 6, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 6, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Still connecting to unix:///var/lib/kubelet/csi-plugins/*.csi.alibabacloud.com/csi.sock #1127

Still connecting to unix:///var/lib/kubelet/csi-plugins/*.csi.alibabacloud.com/csi.sock #1127

lliiang commented Aug 6, 2024

huww98 commented Aug 7, 2024

lliiang commented Aug 7, 2024

huww98 commented Aug 7, 2024

lliiang commented Aug 8, 2024

huww98 commented Aug 8, 2024

huww98 commented Aug 8, 2024

k8s-triage-robot commented Nov 6, 2024

k8s-triage-robot commented Dec 6, 2024

Still connecting to unix:///var/lib/kubelet/csi-plugins/*.csi.alibabacloud.com/csi.sock #1127

Still connecting to unix:///var/lib/kubelet/csi-plugins/*.csi.alibabacloud.com/csi.sock #1127

Comments

lliiang commented Aug 6, 2024

What happened:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

huww98 commented Aug 7, 2024

lliiang commented Aug 7, 2024

huww98 commented Aug 7, 2024

lliiang commented Aug 8, 2024

huww98 commented Aug 8, 2024

huww98 commented Aug 8, 2024

k8s-triage-robot commented Nov 6, 2024

k8s-triage-robot commented Dec 6, 2024