Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API Authorization from Outside EKS Cluster throws Unauthorized error after 20+ minutes #590

Closed
Zhenye-Na opened this issue Apr 12, 2023 · 14 comments · May be fixed by #737
Closed

API Authorization from Outside EKS Cluster throws Unauthorized error after 20+ minutes #590

Zhenye-Na opened this issue Apr 12, 2023 · 14 comments · May be fixed by #737
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@Zhenye-Na
Copy link

We have tried to implement similar methods defined in section https://github.com/kubernetes-sigs/aws-iam-authenticator#api-authorization-from-outside-a-cluster in Golang

func GetEKSToken(ctx context, clusterName string) (*KubeToken, error) {

	request, _ := NewStsClientFrom(ctx).GetCallerIdentityRequest(&sts.GetCallerIdentityInput{})
	# clusterName in the request header
        request.HTTPRequest.Header.Add("x-k8s-aws-id", clusterName)

	presignUrl, err := request.Presign(60) # presign 60 seconds
	if err != nil {
            .....
	}

	return &KubeToken{
		Kind:       "ExecCredential",
		ApiVersion: "client.authentication.k8s.io/v1beta1",
		Status: &KubeTokenStatus{
			ExpirationTimestamp: time.Now().Local().Add(time.Hour * time.Duration(1)).Format("2006-01-02T15:04:05Z"),
			Token:               "k8s-aws-v1." + base64.RawURLEncoding.EncodeToString([]byte(presignUrl)),
		},
	}, nil
}

It works fine.

However, we are having intermittent issue that the Kubernetes Client created with this token is throwing Unauthorized error when performing kubernetes operation, for example, equivalent to kubectl get nodes

We are using https://kubernetes.io/docs/reference/config-api/client-authentication.v1beta1/#client-authentication-k8s-io-v1beta1-ExecCredential instead of using similar headers = {'Authorization': 'Bearer ' + get_bearer_token('my_cluster', 'us-east-1')} as the tutorial.

I am wondering what could be the potential reason for this Unauthorized error ? The API successfully ran for almost 20 minutes and then suddenly this error is thrown back.

I am thinking:

  1. Is the bearerToken expired exactly at the timestamp defined in ExpirationTimestamp or after some magical time delta ? Currently we configured the ExpirationTimestamp to be 1 hour after the token is generated. Does this conflict with the sts presign 60 seconds ?
  2. I noticed that Something to note though is that the IAM Authenticator explicitly omits base64 padding to avoid any = characters thus guaranteeing a string safe to use in URLs. is mentioned and the python code example
    # remove any base64 encoding padding:
    return 'k8s-aws-v1.' + re.sub(r'=*', '', base64_url)

is explicitly replace = with empty string, which is absent from our golang methods, how so far everything related to kubernetes operation is working fine.

Also trying to get some feedback on this if there is anything else that I am missing.

@Zhenye-Na
Copy link
Author

Trying to bump this issue up again since no replies received after a month

@iamnoah
Copy link

iamnoah commented May 18, 2023

Also seeing this. We will see a window where sometimes a generated token gets Unauthorized for 3-5 minutes before it expires. The structure of the go-lang client auth plugins keeps us from detecting the problem and regenerating the token, so we have short-lived outages. Seems to happen after about an hour.

Our client is running in an EKS cluster using IRSA and communicating with a different EKS cluster.

@iamnoah
Copy link

iamnoah commented May 19, 2023

I think I understand what is going on in our case. We are using the token.Generator without passing in a Session, in a k8s Pod using IRSA. The IAM Role's MaxSessionDuration is 1 hour. So what happens is:

  1. t0: Pod Starts Up
    a. Using the WEB_IDENTITY_TOKEN_FILE, does an sts:AssumeRoleWithWebIdentity, getting a session that is valid for 1 hour
    b. Using that session, generate an EKS token (by presigning a GetCallerIdentity request)
  2. +14m - token expires, the same session is used to generate a new token
  3. +28m, 42m, 56m - generate a new token with the same session
  4. At 1 hour, the original session is expired, but the last token we generated says it has 10m before it expires.
  5. Around 1h3m, EKS starts to reject the last generated token. Not sure why there is a 3m delay, but it's pretty consistent. Probably some kind of grace period in STS.
  6. When that token expires, the SDK detects that the session is also expired, creates a new session, and uses that to generate a new token and everything is fine again.

So the problem is that if you use a session that is about to expire to presign, you get less than the 15 minutes assumed in the code. The correct expiration would be min(session.Expiry, time.Now() + 15m)

@iamnoah
Copy link

iamnoah commented May 19, 2023

One workaround is to expire the session early:

s, err := session.NewSessionWithOptions(session.Options{
	SharedConfigState: session.SharedConfigEnable,
	CredentialsProviderOptions: &session.CredentialsProviderOptions{
		WebIdentityRoleProviderOptions: func(provider *stscreds.WebIdentityRoleProvider) {
			// When the session expires, pre-signed tokens seem to become invalid within 3 minutes,
			// even if they were created <15 minutes ago. Expiring the session 12.5 minutes early
			// should keep the token from falling into this window.
			provider.ExpiryWindow = 12*time.Minute + 30*time.Second
		},
	},
})
tok, err := gen.GetWithOptions(&token.GetTokenOptions{
	Session: s,
        // set ClusterID, etc.
})

(last comment, sorry for hijacking this ticket)

iamnoah added a commit to SecurityJourney/aws-iam-authenticator that referenced this issue May 19, 2023
Fixes kubernetes-sigs#590

This comes up when using a long-lived `token.Generator` instance where the underlying assume-role session might expire, invalidating the token.
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 20, 2024
@Zhenye-Na
Copy link
Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 21, 2024
@hjkatz
Copy link

hjkatz commented Feb 6, 2024

We're also experiencing this issue using kubectl --kubeconfig with these settings:

        {
            "name": "example-user",
            "user": {
                "exec": {
                    "command": "aws",
                    "args": [
                        "--profile",
                        "my-profile",
                        "--region",
                        "us-west-2",
                        "eks",
                        "get-token",
                        "--cluster-name",
                        "my-cluster"
                    ],
                    "env": [],
                    "apiVersion": "client.authentication.k8s.io/v1beta1",
                    "provideClusterInfo": false
                }
            }

Should we enable any additional config options?

@pravinchandar
Copy link

Hey folks, I'm running into this issue as well, wondering if there's an update?

@iamnoah I also tried your patch, but still seeing Unauthorized after about a few minutes after the pod starts.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 3, 2024
@Zhenye-Na
Copy link
Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 5, 2024
alvaroaleman added a commit to alvaroaleman/aws-iam-authenticator that referenced this issue Aug 6, 2024
What this PR does / why we need it:

There are two expirations which must be considered when using a signed EKS token:

* 15 minutes after the point in time when the AWS STS request has been signed
* The underlying AWS credentials can expire at which point the token won't be accepted

The second case is particularly common when making frequent requests while using AssumeRole or AssumeRoleWithWebRequest as mentioned in kubernetes-sigs#590 as the default session timeout is 1 hour.

This PR adds an additional check fetching the AWS credential expiration and using that as the returned expiration if it is before the 15 minute token expiration.

Which issue(s) this PR fixes

Fixes kubernetes-sigs#590
alvaroaleman added a commit to alvaroaleman/aws-iam-authenticator that referenced this issue Aug 6, 2024
What this PR does / why we need it:

There are two expirations which must be considered when using a signed EKS token:

* 15 minutes after the point in time when the AWS STS request has been signed
* The underlying AWS credentials can expire at which point the token won't be accepted

The second case is particularly common when making frequent requests while using AssumeRole or AssumeRoleWithWebRequest as mentioned in kubernetes-sigs#590 as the default session timeout is 1 hour.

This PR adds an additional check fetching the AWS credential expiration and using that as the returned expiration if it is before the 15 minute token expiration.
alvaroaleman added a commit to alvaroaleman/aws-iam-authenticator that referenced this issue Sep 1, 2024
What this PR does / why we need it:

There are two expirations which must be considered when using a signed EKS token:

* 15 minutes after the point in time when the AWS STS request has been signed
* The underlying AWS credentials can expire at which point the token won't be accepted

The second case is particularly common when making frequent requests while using AssumeRole or AssumeRoleWithWebRequest as mentioned in kubernetes-sigs#590 as the default session timeout is 1 hour.

This PR adds an additional check fetching the AWS credential expiration and using that as the returned expiration if it is before the 15 minute token expiration.
alvaroaleman added a commit to alvaroaleman/aws-iam-authenticator that referenced this issue Sep 1, 2024
What this PR does / why we need it:

There are two expirations which must be considered when using a signed EKS token:

* 15 minutes after the point in time when the AWS STS request has been signed
* The underlying AWS credentials can expire at which point the token won't be accepted

The second case is particularly common when making frequent requests while using AssumeRole or AssumeRoleWithWebRequest as mentioned in kubernetes-sigs#590 as the default session timeout is 1 hour.

This PR adds an additional check fetching the AWS credential expiration and using that as the returned expiration if it is before the 15 minute token expiration.
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 3, 2024
alvaroaleman added a commit to alvaroaleman/aws-iam-authenticator that referenced this issue Oct 24, 2024
What this PR does / why we need it:

There are two expirations which must be considered when using a signed EKS token:

* 15 minutes after the point in time when the AWS STS request has been signed
* The underlying AWS credentials can expire at which point the token won't be accepted

The second case is particularly common when making frequent requests while using AssumeRole or AssumeRoleWithWebRequest as mentioned in kubernetes-sigs#590 as the default session timeout is 1 hour.

This PR adds an additional check fetching the AWS credential expiration and using that as the returned expiration if it is before the 15 minute token expiration.
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 3, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
6 participants