Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running linkerd-proxy as a native sidecar fails for some argo workflow pods #13349

Open
bwmetcalf opened this issue Nov 19, 2024 · 2 comments
Open
Labels

Comments

@bwmetcalf
Copy link

What is the issue?

We are injecting linkerd-proxy in our argo workflows as a native sidecar using the annotation

config.alpha.linkerd.io/proxy-enable-native-sidecar: true

One of our workflows that spins up multiple pods works fine through the first two or three pods containing multiple steps, but with one of the pods linkerd-proxy exits with a 137 and the following are the last events for this pod

 Normal   Killing    37m   kubelet            Stopping container linkerd-proxy
 Warning  Unhealthy  37m   kubelet            Readiness probe failed: Get "http://10.3.175.1:4191/ready": dial tcp 10.3.175.1:4191: connect: connection refused

which causes argo server to mark the step as failed and fails the entire workflow. All other preceding pods in the workflow have only

 Normal   Killing    37m   kubelet            Stopping container linkerd-proxy

as their last event. It seems for whatever reason in this particular pod there is a race condition where the health probes are running as the proxy container is shutting down.

Are there corresponding parameters that possibly should be tweaked when using injecting linkerd-proxy as a native sidecar?

How can it be reproduced?

This isn't clear. I don't yet have a test case as these are fairly complex workflows.

Logs, error output, etc

See above.

output of linkerd check -o short

% linkerd check -o short
linkerd-version
---------------
‼ cli is up-to-date
    unsupported version channel: stable-2.14.10
    see https://linkerd.io/2.14/checks/#l5d-version-cli for hints

control-plane-version
---------------------
‼ control plane is up-to-date
    is running version 24.11.3 but the latest edge version is 24.11.4
    see https://linkerd.io/2.14/checks/#l5d-version-control for hints
‼ control plane and cli versions match
    control plane running edge-24.11.3 but cli running stable-2.14.10
    see https://linkerd.io/2.14/checks/#l5d-version-control for hints

linkerd-control-plane-proxy
---------------------------
‼ control plane proxies are up-to-date
    some proxies are not running the current version:
	* linkerd-destination-5ddc58f9bc-5x9nh (edge-24.11.3)
	* linkerd-destination-5ddc58f9bc-7gkdk (edge-24.11.3)
	* linkerd-destination-5ddc58f9bc-9c99t (edge-24.11.3)
	* linkerd-destination-5ddc58f9bc-brbh5 (edge-24.11.3)
	* linkerd-destination-5ddc58f9bc-ffmdx (edge-24.11.3)
	* linkerd-identity-85fb8c4b5f-c6l7m (edge-24.11.3)
	* linkerd-identity-85fb8c4b5f-ctr4h (edge-24.11.3)
	* linkerd-identity-85fb8c4b5f-jhp8q (edge-24.11.3)
	* linkerd-identity-85fb8c4b5f-nzx8w (edge-24.11.3)
	* linkerd-identity-85fb8c4b5f-vfmkc (edge-24.11.3)
	* linkerd-proxy-injector-5497b8cb97-fw85c (edge-24.11.3)
	* linkerd-proxy-injector-5497b8cb97-g22xn (edge-24.11.3)
	* linkerd-proxy-injector-5497b8cb97-g2m2v (edge-24.11.3)
	* linkerd-proxy-injector-5497b8cb97-gjfwv (edge-24.11.3)
	* linkerd-proxy-injector-5497b8cb97-jwrnl (edge-24.11.3)
    see https://linkerd.io/2.14/checks/#l5d-cp-proxy-version for hints
‼ control plane proxies and cli versions match
    linkerd-destination-5ddc58f9bc-5x9nh running edge-24.11.3 but cli running stable-2.14.10
    see https://linkerd.io/2.14/checks/#l5d-cp-proxy-cli-version for hints

linkerd-ha-checks
-----------------
‼ pod injection disabled on kube-system
    kube-system namespace needs to have the label config.linkerd.io/admission-webhooks: disabled if injector webhook failure policy is Fail
    see https://linkerd.io/2.14/checks/#l5d-injection-disabled for hints

linkerd-viz
-----------
‼ viz extension proxies are up-to-date
    some proxies are not running the current version:
	* metrics-api-5789bcc5d-2zdck (edge-24.11.3)
	* prometheus-9c78c7f55-7q88p (edge-24.11.3)
	* tap-6688cddf94-st2jc (edge-24.11.3)
	* tap-injector-85b47576fc-9k222 (edge-24.11.3)
	* web-8c5b96b6-s7ggv (edge-24.11.3)
    see https://linkerd.io/2.14/checks/#l5d-viz-proxy-cp-version for hints
‼ viz extension proxies and cli versions match
    metrics-api-5789bcc5d-2zdck running edge-24.11.3 but cli running stable-2.14.10
    see https://linkerd.io/2.14/checks/#l5d-viz-proxy-cli-version for hints

Status check results are √

Environment

Server Version: v1.29.8-eks-a737599

Possible solution

No response

Additional context

No response

Would you like to work on fixing this bug?

None

@bwmetcalf bwmetcalf added the bug label Nov 19, 2024
@fullykubed
Copy link

@bwmetcalf I believe you are suffering from this issue: Panfactum/stack#164

@bwmetcalf
Copy link
Author

bwmetcalf commented Nov 23, 2024

@bwmetcalf I believe you are suffering from this issue: Panfactum/stack#164

Thanks. I'll give it a try and report back. For now, we are not injecting as a native sidecar and using the following annotation

      workflows.argoproj.io/kill-cmd-linkerd-proxy: '["/usr/lib/linkerd/linkerd-await","sleep","1","--shutdown"]'

This is working, but it seems like injecting mesh proxies as native sidecars is preferable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants