Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not panic when a default snapshotclass does not exist #1210

Merged

Conversation

leonardoce
Copy link
Contributor

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind api-change
/kind bug
/kind cleanup
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #1203

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Nov 22, 2024
@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Nov 22, 2024
@leonardoce
Copy link
Contributor Author

Please just close it if I don't understand the context.
In my opinion, we can reach createSnapshotWrapper for group snapshot members when we don't have a snapshot class.
In that case, we should not panic.

@leonardoce leonardoce force-pushed the panic-member-no-snapclass branch from 5fa26d0 to a12d2b5 Compare November 22, 2024 20:17
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Nov 22, 2024
@leonardoce
Copy link
Contributor Author

The only other call to createSnapshotWrapper is in createSnapshot, and we already don't call it for group snapshot members:

https://github.com/kubernetes-csi/external-snapshotter/blob/master/pkg/sidecar-controller/snapshot_controller.go#L89-L93

@leonardoce leonardoce requested a review from xing-yang November 22, 2024 20:21
@gnufied
Copy link
Contributor

gnufied commented Nov 22, 2024

So while this indeed does make the panic disappear. It does not actually fixes underlying issue.

  boundVolumeSnapshotContentName: snapcontent-ea4accb874afdb52cfe72416e09a4ec35293f5c191499af491d3a1ca3982d42d
  error:
    message: Failed to set default snapshot class with error cannot find default snapshot
      class
    time: "2024-11-22T21:25:52Z"

What is worse is, when this happens - snapshotter is spamming the API server with volumesnapshot updates. See -

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  creationTimestamp: "2024-11-22T21:04:18Z"
  finalizers:
  - snapshot.storage.kubernetes.io/volumesnapshot-in-group-protection
  - snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection
  generation: 1
  labels:
    groupsnapshot.storage.k8s.io/volumeGroupSnapshotName: new-groupsnapshot-demo
  name: snapshot-ea4accb874afdb52cfe72416e09a4ec35293f5c191499af491d3a1ca3982d42d
  namespace: default
  resourceVersion: "7599"
  uid: 721e6795-4c7d-4516-b6d4-79da6ed0cf01
spec:
  source:
    persistentVolumeClaimName: csi-pvc
status:
  boundVolumeSnapshotContentName: snapcontent-ea4accb874afdb52cfe72416e09a4ec35293f5c191499af491d3a1ca3982d42d
  readyToUse: false
---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  creationTimestamp: "2024-11-22T21:04:18Z"
  finalizers:
  - snapshot.storage.kubernetes.io/volumesnapshot-in-group-protection
  - snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection
  generation: 1
  labels:
    groupsnapshot.storage.k8s.io/volumeGroupSnapshotName: new-groupsnapshot-demo
  name: snapshot-ea4accb874afdb52cfe72416e09a4ec35293f5c191499af491d3a1ca3982d42d
  namespace: default
  resourceVersion: "7600"
  uid: 721e6795-4c7d-4516-b6d4-79da6ed0cf01
spec:
  source:
    persistentVolumeClaimName: csi-pvc
status:
  boundVolumeSnapshotContentName: snapcontent-ea4accb874afdb52cfe72416e09a4ec35293f5c191499af491d3a1ca3982d42d
  error:
    message: Failed to set default snapshot class with error cannot find default snapshot
      class
    time: "2024-11-22T21:26:30Z"

This objects's resourceVersion is 7600. That means, this object has been updated 7600 times in just last 5-10 minutes or so. This seems pretty weird.

@leonardoce
Copy link
Contributor Author

leonardoce commented Nov 22, 2024

I agree with you, @gnufied, but unfortunately, I cannot reproduce the behavior you're experiencing.
I can provision it successfully without any volumesnapshotclass.

This objects's resourceVersion is 7600. That means, this object has been updated 7600 times in just last 5-10 minutes or so. This seems pretty weird.

Could you let me know if you are sure about this? IIRC resourceVersion is not granted to start from 1.

$ k get volumesnapshot  snapshot-890272fe05591bd7f5fd7b3202955fa1b9b3b6f4c4eeb1d1d1daa4bba35db86f -o yaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  creationTimestamp: "2024-11-22T21:52:20Z"
  finalizers:
  - snapshot.storage.kubernetes.io/volumesnapshot-in-group-protection
  - snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection
  generation: 1
  labels:
    groupsnapshot.storage.k8s.io/volumeGroupSnapshotName: new-groupsnapshot-demo
  name: snapshot-890272fe05591bd7f5fd7b3202955fa1b9b3b6f4c4eeb1d1d1daa4bba35db86f
  namespace: default
  resourceVersion: "2769421"
  uid: 70b07ee6-ed2c-4c54-84c5-46fe3c340bd2
spec:
  source:
    persistentVolumeClaimName: cluster-example-1-wal
status:
  boundVolumeSnapshotContentName: snapcontent-890272fe05591bd7f5fd7b3202955fa1b9b3b6f4c4eeb1d1d1daa4bba35db86f
  creationTime: "2024-11-22T21:52:17Z"
  readyToUse: true
  restoreSize: 1Gi
  volumeGroupSnapshotName: new-groupsnapshot-demo

@gnufied
Copy link
Contributor

gnufied commented Nov 22, 2024

You are right that resourceVersion doesn't start from 1. But I see very frequent updates to volumesnapshot object:

snapshot-controller : I1122 16:55:09.550074 1559484 util.go:284] storeObjectUpdate updating snapshot "default/snapshot-e10e58934be6b25e706ea1cc371c18f08e4638bcff4c001b8f294fbaf39f9241" with version 9032
snapshot-controller : I1122 16:55:09.747271 1559484 request.go:632] Waited for 197.153734ms due to client-side throttling, not priority and fairness, request: GET:https://192.168.1.108:6443/apis/snapshot.storage.k8s.io/v1/namespaces/default/volumesnapsh
ots/snapshot-e10e58934be6b25e706ea1cc371c18f08e4638bcff4c001b8f294fbaf39f9241
snapshot-controller : I1122 16:55:09.946757 1559484 request.go:632] Waited for 198.276968ms due to client-side throttling, not priority and fairness, request: PUT:https://192.168.1.108:6443/apis/snapshot.storage.k8s.io/v1/namespaces/default/volumesnapshots/snapshot-e10e58934be6b25e706ea1cc371c18f08e4638bcff4c001b8f294fbaf39f9241/status
snapshot-controller : I1122 16:55:09.949437 1559484 util.go:284] storeObjectUpdate updating snapshot "default/snapshot-e10e58934be6b25e706ea1cc371c18f08e4638bcff4c001b8f294fbaf39f9241" with version 9033                                                   snapshot-controller : I1122 16:55:10.147201 1559484 request.go:632] Waited for 197.713516ms due to client-side throttling, not priority and fairness, request: GET:https://192.168.1.108:6443/apis/snapshot.storage.k8s.io/v1/namespaces/default/volumesnapsh
ots/snapshot-e10e58934be6b25e706ea1cc371c18f08e4638bcff4c001b8f294fbaf39f9241                                                                                                                                                                                snapshot-controller : I1122 16:55:10.347235 1559484 request.go:632] Waited for 198.449458ms due to client-side throttling, not priority and fairness, request: PUT:https://192.168.1.108:6443/apis/snapshot.storage.k8s.io/v1/namespaces/default/volumesnapsh
ots/snapshot-e10e58934be6b25e706ea1cc371c18f08e4638bcff4c001b8f294fbaf39f9241/status
snapshot-controller : I1122 16:55:10.350431 1559484 util.go:284] storeObjectUpdate updating snapshot "default/snapshot-e10e58934be6b25e706ea1cc371c18f08e4638bcff4c001b8f294fbaf39f9241" with version 9036                                                   snapshot-controller : I1122 16:55:10.381989 1559484 util.go:284] storeObjectUpdate updating groupsnapshot "default/new-groupsnapshot-demo" with version 9038

Which is concerning for me.

@gnufied
Copy link
Contributor

gnufied commented Nov 22, 2024

okay, this is looking better now:

apiVersion: v1
items:
- apiVersion: snapshot.storage.k8s.io/v1
  kind: VolumeSnapshot
  metadata:
    creationTimestamp: "2024-11-22T22:14:35Z"
    finalizers:
    - snapshot.storage.kubernetes.io/volumesnapshot-in-group-protection
    - snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection
    generation: 1
    labels:
      groupsnapshot.storage.k8s.io/volumeGroupSnapshotName: new-groupsnapshot-demo
    name: snapshot-bc3e5c8c83477504cb366a854e59b506ffa996fe652cfb58f9a93cd520640fb5
    namespace: default
    resourceVersion: "428"
    uid: 846e12d1-3d37-4721-9cee-205169df0959
  spec:
    source:
      persistentVolumeClaimName: csi-pvc
  status:
    boundVolumeSnapshotContentName: snapcontent-bc3e5c8c83477504cb366a854e59b506ffa996fe652cfb58f9a93cd520640fb5
    creationTime: "2024-11-22T22:14:35Z"
    readyToUse: true
    restoreSize: 1Gi
    volumeGroupSnapshotName: new-groupsnapshot-demo
kind: List
metadata:
  resourceVersion: ""

@gnufied
Copy link
Contributor

gnufied commented Nov 22, 2024

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 22, 2024
return ctrl.createSnapshotWrapper(content)
}

return content, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have some tests for this? For now - unit tests will be good. In long term, we should have some e2e as well I think. I am not sure if all users will have a default snapshotclass when testing volumegroupsnapshot feature.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened an issue to track this: #1211

@xing-yang
Copy link
Collaborator

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: leonardoce, xing-yang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 22, 2024
@k8s-ci-robot k8s-ci-robot merged commit c5c0d15 into kubernetes-csi:master Nov 22, 2024
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Using volumegroup snapshot requires a volumesnapshotclass?
4 participants