-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return NotFound for unknown pod names #11540
Conversation
Signed-off-by: Alex Leong <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Simple and straight to the point. I tested it and it works well.
One tiny suggestion. The change is made in getEndpointByHostname()
which in turn will be called from getOrNewPodPublisher
. In the latter, we are already formatting the error (see the linked snippet)
linkerd2/controller/api/destination/watcher/pod_watcher.go
Lines 277 to 283 in 29e6683
if hostname != "" { | |
pod, err = pw.getEndpointByHostname(hostname, service) | |
if err != nil { | |
return nil, fmt.Errorf("failed to get pod for hostname %s: %w", hostname, err) | |
} | |
ip = pod.Status.PodIP | |
} else { |
This yields the following error message:
:; go run controller/script/destination-client/main.go -token "{}" -path 10-23-0-37.nginx-svc.default.svc.cluster.local:80 -method getProfile
FATA[0000] rpc error: code = NotFound desc = failed to get pod for hostname 10-23-0-37: rpc error: code = NotFound desc = no pod found in Endpoints default/nginx-svc for hostname 10-23-0-37
exit status 1
If we take the formatting out of the calling function (pod watcher one) the error should look nicer:
:; go run controller/script/destination-client/main.go -token "{}" -path 10-23-0-37.nginx-svc.default.svc.cluster.local:80 -method getProfile
FATA[0000] rpc error: code = NotFound desc = no pod found in Endpoints default/nginx-svc for hostname 10-23-0-37
exit status 1
Signed-off-by: Alex Leong <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice 👍
This edge release fixes two bugs in the Destination controller that could cause outbound connections to hang indefinitely. * helm: Introduce configurable values for protocol detection ([#11536]) * destination: Fix GetProfiles error when address is opaque and unmeshed ([#11556]) * destination: Return NotFound for unknown pod names ([#11540]) * proxy: Log controller errors at WARN * proxy: Fix grpc_status metric labels for inbound traffic [#11536]: #11536 [#11556]: #11556 [#11540]: #11540
This edge release fixes two bugs in the Destination controller that could cause outbound connections to hang indefinitely. * helm: Introduce configurable values for protocol detection ([#11536]) * destination: Fix GetProfiles error when address is opaque and unmeshed ([#11556]) * destination: Return NotFound for unknown pod names ([#11540]) * proxy: Log controller errors at WARN * proxy: Fix grpc_status metric labels for inbound traffic [#11536]: #11536 [#11556]: #11556 [#11540]: #11540
Could this fix be backported to / included in the 2.14 release branch? Since linkerd/linkerd2-proxy#2499 was merged and released in v2.14.3 we're now experiencing a lot of log spam when using headless services (as explained in the description above). |
@annismckenzie Unfortunately we are no longer backporting to 2.14 except for critical security patches. |
Fixes #11065
When an inbound proxy receives a request with a canonical name of the form
hostname.service.namespace.svc.cluster.domain
, we assume thathostname
is the hostname of the pod as described in https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-hostname-and-subdomain-fields. However, pods are also addressable withpod-ip.service.namespace.svc.cluster.domain
. When the destination controller gets a profile request of this form, we attempt to find a pod with hostname ofpod-ip
and return an error with gRPC statusUnknown
since this will not exist.It is expected that this profile lookup will fail since we cannot have service profiles for individual pods. However, returning a gRPC status
Unknown
for these requests brings the reported success rate of the destination controller down. Instead we should return these as gRPC statusNotFound
so that these responses don't get reported as server errors.