Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The resource spec contents of pp and rb are not synchronized in time. #5996

Open
CharlesQQ opened this issue Dec 30, 2024 · 1 comment
Open
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Milestone

Comments

@CharlesQQ
Copy link
Member

CharlesQQ commented Dec 30, 2024

What happened:

  1. During controller restart, the .spec.suspension fields of rb and pp are different

https://www.processon.com/v/6772523638625f07c0ec6273
karmada 灰度发布流程

During the restart, the resource detector controller processes the currently released workload later than the binding controller, causing the pause setting to become invalid; our custom controller extension (for grayscale release) does not update the op resource, and the number of new pods more than partition.

  1. After pp sets .spec.preserveResourcesOnDeletion=true, execute the deletion operation immediately, and the resources of the member clusters may be deleted in cascade

What you expected to happen:

The relevant fields in pp, rb, and work can be consistent. If the fields in rb and work are inconsistent with pp, then the subsequent operations related to rb and work do not need to be performed until the synchronization of the relevant fields is completed

How to reproduce it (as minimally and precisely as possible):

  1. have lost of resouce template And pp, restart controller and set .spec.suspension.dispatching=true, check rb field .spec.suspension.dispatching.
  2. set pp field .spec.preserveResourcesOnDeletion=true, and deletion operation immediately, check whether the resource has been cascade deleted in member cluster.

Anything else we need to know?:

Environment:

  • Karmada version:
    v1.12
  • kubectl-karmada or karmadactl version (the result of kubectl-karmada version or karmadactl version):
  • Others:
@CharlesQQ CharlesQQ added the kind/bug Categorizes issue or PR as related to a bug. label Dec 30, 2024
@CharlesQQ
Copy link
Member Author

I have a solution for reference only

When PP changes, the hash value of the current PP is calculated and added to the annotation By webhook; when the resoure detector updates RB, the hash value is also synchronized; before reconcile, the binding controller first detects whether the hash values ​​of rb and pp are consistent; if they are not same, then Return to wait; the path from rb to work is similar

解决pp 和rb 字段存在gap 的问题

Reference for calculation method of hash value from openkruise.
https://github.com/openkruise/kruise/blob/e3e6d471a75737606e8cfe5338ad92bdddc72699/pkg/webhook/sidecarset/mutating/sidecarset_create_update_handler.go#L48-L66

@RainbowMango RainbowMango added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Dec 31, 2024
@RainbowMango RainbowMango added this to the v1.13 milestone Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Projects
Status: Accepted
Development

No branches or pull requests

2 participants