Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Don't let calico modify CNI plugins #11780

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

VannTen
Copy link
Contributor

@VannTen VannTen commented Dec 10, 2024

What type of PR is this?
/kind bug

What this PR does / why we need it:

Special note to reviewers

The implementation of this PR needs to be changed, see #11780 (comment)

Which issue(s) this PR fixes:
Fixes #11747

Does this PR introduce a user-facing change?:

action required
The workaround variable `cni_bin_owner` is removed since the underlying issue with calico is fixed.

/label tide/merge-method-merge

This is not needed and has potential bad side effects (setting the mode
/ owner of files/directories below.
Calico install its cni plugins itself
@k8s-ci-robot k8s-ci-robot added release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. kind/bug Categorizes issue or PR as related to a bug. tide/merge-method-merge Denotes a PR that should use a standard merge by tide when it merges. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 10, 2024
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 10, 2024
@VannTen
Copy link
Contributor Author

VannTen commented Dec 10, 2024

@Rickkwa
@rptaylor
/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Dec 10, 2024
@Rickkwa
Copy link
Contributor

Rickkwa commented Dec 13, 2024

Thanks, looks good to me. My original workaround didn't make any default behavior change because I wasn't familiar with CNI's and if that would have caused any side effects.

@@ -1,2 +0,0 @@
---
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HI @VannTen

Is revert #10929 required to fix the issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. I'm reverting it because it's a workaround which is no longer needed, and the behavior change for existing users:

  • Is pretty small
  • Should be irrelevant now.

@yankay
Copy link
Member

yankay commented Dec 16, 2024

HI @cyclinder
Would you please help to review it .

@@ -447,7 +447,7 @@ downloads:
- etcd

cni:
enabled: true
enabled: "{{ kube_network_plugin != 'calico' }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @VannTen, Why do we don't need to install cni when calico is enabled? I think we still need this, in some user cases, the user needs to run macvlan/ipvlan cni and calico at the same time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because install it's own and we end up with race, see #11747.

Good point about the multi- net plugin case though 🤔
/approve cancel

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the CNI and calico installations this should be independent, not mutually exclusive. The similarity is that they both share the hostPath /opt/cni/bin. calico only installs the binary calico and calico-ipam into /opt/cni/bin, so it expects /opt/cni/bin to be present on the host, and if it is not, it creates.

It seems that revert #10407 should solve the problem, but I'm not sure what the owner variable does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kube_owner was introduced for CIS compliance stuff, but I think it has been used a bit randomly since then 🤔 see #8952

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the CNI and calico installations this should be independent, not mutually exclusive

That would make more sense to me, but it seems calico is saying otherwise : projectcalico/calico#6004 (comment) (unless I'm parsing that wrong about the "installing upstream containernetworking/plugins binaries"

In that case, 81ee96a should suffice, I think 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it seems calico is saying otherwise : projectcalico/calico#6004 (comment)

calico installs these CNI plugins: flannel, bandwidth, host-local, portmap, etc. These plugins may be required to work with calico (optional), so calico installs them by default. However, other plugins such as macvlan, ipvlan, etc. (https://github.com/containernetworking/plugins/tree/main/plugins) are not installed by calico. so we do need to install the whole cni plugins.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, 81ee96a should suffice, I think 🤔

what the recurse variable does?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It applies the file attribute on everything in that dir tree. see #11747 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok ! Thanks for the explanation, I haven't looked closely at the CNI bits in K8s before 👍

In that case, I think we should instead keep our install (but still fixes those recurse/permissions stuff) and patches the calico manifests to not install the CNI.
It seems their cni plugins are slightly different according to projectcalico/calico#6004 (comment) (bumped deps, etc). I'm not sure if we should ship their versions or not 🤔 when using calico.

The simpler path would be not to, that's for sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It applies the file attribute on everything in that dir tree. see #11747 (comment)

the all CNI binary attributes must be executable, or we can have a task for changing the file attributes under the /opt/cni/bin

I think we should instead keep our install (but still fixes those recurse/permissions stuff)

agree

and patches the calico manifests to not install the CNI.

It's difficult for us, Calico, to install not only these CNI plugins: flannel, bandwidth, host-local, port map, etc., but also their own CNI plugins(Calico and calico-ipam). It looks like the key to solving the problem is to change the properties of these files.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign mzaian for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 16, 2024
@VannTen VannTen changed the title Let calico install its own CNI plugin WIP: Let calico install its own CNI plugin Dec 17, 2024
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 17, 2024
@VannTen VannTen changed the title WIP: Let calico install its own CNI plugin WIP: Don't let calico modify CNI plugins Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/bug Categorizes issue or PR as related to a bug. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. tide/merge-method-merge Denotes a PR that should use a standard merge by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Address CNI installation race condition
5 participants