Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return NotFound for unknown pod names #11540

Merged
merged 2 commits into from
Nov 1, 2023
Merged

Return NotFound for unknown pod names #11540

merged 2 commits into from
Nov 1, 2023

Conversation

adleong
Copy link
Member

@adleong adleong commented Oct 26, 2023

Fixes #11065

When an inbound proxy receives a request with a canonical name of the form hostname.service.namespace.svc.cluster.domain, we assume that hostname is the hostname of the pod as described in https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-hostname-and-subdomain-fields. However, pods are also addressable with pod-ip.service.namespace.svc.cluster.domain. When the destination controller gets a profile request of this form, we attempt to find a pod with hostname of pod-ip and return an error with gRPC status Unknown since this will not exist.

It is expected that this profile lookup will fail since we cannot have service profiles for individual pods. However, returning a gRPC status Unknown for these requests brings the reported success rate of the destination controller down. Instead we should return these as gRPC status NotFound so that these responses don't get reported as server errors.

@adleong adleong requested a review from a team as a code owner October 26, 2023 22:05
Copy link
Member

@mateiidavid mateiidavid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Simple and straight to the point. I tested it and it works well.

One tiny suggestion. The change is made in getEndpointByHostname() which in turn will be called from getOrNewPodPublisher. In the latter, we are already formatting the error (see the linked snippet)

if hostname != "" {
pod, err = pw.getEndpointByHostname(hostname, service)
if err != nil {
return nil, fmt.Errorf("failed to get pod for hostname %s: %w", hostname, err)
}
ip = pod.Status.PodIP
} else {

This yields the following error message:

:; go run controller/script/destination-client/main.go -token "{}" -path 10-23-0-37.nginx-svc.default.svc.cluster.local:80  -method getProfile
FATA[0000] rpc error: code = NotFound desc = failed to get pod for hostname 10-23-0-37: rpc error: code = NotFound desc = no pod found in Endpoints default/nginx-svc for hostname 10-23-0-37
exit status 1

If we take the formatting out of the calling function (pod watcher one) the error should look nicer:

:; go run controller/script/destination-client/main.go -token "{}" -path 10-23-0-37.nginx-svc.default.svc.cluster.local:80  -method getProfile
FATA[0000] rpc error: code = NotFound desc = no pod found in Endpoints default/nginx-svc for hostname 10-23-0-37
exit status 1


Copy link
Member

@alpeb alpeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍

@adleong adleong merged commit 8c42a50 into main Nov 1, 2023
34 checks passed
@adleong adleong deleted the alex/not-found-grpc branch November 1, 2023 22:44
olix0r added a commit that referenced this pull request Nov 2, 2023
This edge release fixes two bugs in the Destination controller that could cause
outbound connections to hang indefinitely.

* helm: Introduce configurable values for protocol detection ([#11536])
* destination: Fix GetProfiles error when address is opaque and unmeshed ([#11556])
* destination: Return NotFound for unknown pod names ([#11540])
* proxy: Log controller errors at WARN
* proxy: Fix grpc_status metric labels for inbound traffic

[#11536]: #11536
[#11556]: #11556
[#11540]: #11540
@olix0r olix0r mentioned this pull request Nov 2, 2023
olix0r added a commit that referenced this pull request Nov 2, 2023
This edge release fixes two bugs in the Destination controller that could cause
outbound connections to hang indefinitely.

* helm: Introduce configurable values for protocol detection ([#11536])
* destination: Fix GetProfiles error when address is opaque and unmeshed ([#11556])
* destination: Return NotFound for unknown pod names ([#11540])
* proxy: Log controller errors at WARN
* proxy: Fix grpc_status metric labels for inbound traffic

[#11536]: #11536
[#11556]: #11556
[#11540]: #11540
@annismckenzie
Copy link

Could this fix be backported to / included in the 2.14 release branch? Since linkerd/linkerd2-proxy#2499 was merged and released in v2.14.3 we're now experiencing a lot of log spam when using headless services (as explained in the description above).

@wmorgan
Copy link
Member

wmorgan commented May 2, 2024

@annismckenzie Unfortunately we are no longer backporting to 2.14 except for critical security patches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failed requests to destination when using headless services
5 participants