Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve git credential handling in Kubernetes #1577

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

sgaist
Copy link
Contributor

@sgaist sgaist commented Nov 17, 2022

The git credentials are now stored in a secret following recommended best practice.

The secret will have the same lifespan as the builder pod.

@sgaist
Copy link
Contributor Author

sgaist commented Nov 17, 2022

One note: we might also want to consider making the pod owner of the secret which would automatically trigger its deletion along side the pod.

However, if for some reason the pod fails to be created the secret would remain so there's a need to clean it up manually in that case.

@sgaist sgaist force-pushed the improve_git_credential_handling branch from 186c740 to eb0d9ee Compare November 17, 2022 16:46
binderhub/build.py Outdated Show resolved Hide resolved
Copy link
Member

@consideRatio consideRatio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A chart managed secret / a dynamically created secret?

Is the git credentials the same for all build pods? If it is, we can just have a Helm chart template create a k8s Secret with a key in it representing the credentials.

Single / Multiple k8s Secrets

Have it been considered to use just one single secret that has several keys, one per build pod? Could be better, could be worse. It would feel good to know we have reflected on it a bit.

Copy link
Member

@consideRatio consideRatio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git_credentials = Unicode(

I think the way to go is a much smaller change, where the pod we dynamically create just references a Helm chart managed k8s Secret. If this is to create a new k8s Secret template, or to reference an existing k8s Secret, is not something I've explored yet.

The k8s Secret needs to be located in the same namespace as the build pods, but a Helm chart can specify a custom namespace if needed in a k8s resource's metadata.

@consideRatio consideRatio added the maintenance Under the hood improvements and fixes label Nov 18, 2022
@sgaist
Copy link
Contributor Author

sgaist commented Nov 18, 2022

@consideRatio Thanks for your suggestions. To answer your questions/remarks:

Currently the credentials are static by provider but my implementation was done so on purpose: it takes into account #1169 (Allow dynamic repository credentials for authenticated Binderhub instances) that I am interested in as well.

###A chart managed secret / a dynamically created secret?

If we take the static credential use case, having one single chart managed secret is a good idea. The git_credentials property could be nuked in favor of setting the key matching the repo provider.

If we take the #1169 use case into account, we would need to ensure that we cleanup the content of the secret to avoid keeping tokens (and a growing secret) for more time than necessary (i.e. user session length + builder lifetime).

From the top of my head, this could be done in the following way:

  • Start build
  • Update secret with token using pod name as key
  • Run build
  • Update secret and remove the token

The thing we need to determine now is how to differentiate a static credential VS a user provided one.

###Single / Multiple k8s Secrets

In the static case, I think one single secret would make sense.

For the #1169 use case, it's a different story. We would likely need to benchmark things a bit. Kubernetes secrets cannot be larger than 1MIB but one should also not create too many little Secrets in a given namespace as it could also exhausts memory. The documentation also suggest to use resource quotas in order to limit their number.

@consideRatio
Copy link
Member

Ah okay!

This fix becomes more complicated if planning for #1169 at the same time, to support runtime decided git credentials, but we can go for it if you want. I'd appreciate if another maintainer could opinie about the questions below though, as I'm not comfortable approve/merging this without further agreement otherwise.

Question 1: Is there agreement to add support for runtime decided git credentials?

Doing it imples managing k8s Secrets via the binderhub software itself, as compared to the BinderHub Helm chart.

Question 2: Should we support only runtime decided git credentials, or a mix?

If using binderhub software managed k8s Secrets by default, it means that we take some performance hit as pod must wait for the k8s Secret creation, which follows the pod creation in sequence, and burden the k8s api-server a bit by additional requests. It may be miniscule though. So, if we switch to this where its not used, this is a compromise even though it may possibly be very small. At the same time, if we start supporting both options with a single Helm chart managed k8s Secret and runtime created k8s Secrets, we increase the complexity of configuration/documentation and the complexity of the code base as a whole.

Implementation criterias for runtime decided git credentials

  • Use ownerReferences
    This seems like the only robust way to ensure that we cleanup secondary resources without needing to track another type of k8s resource. There is an implementation for this in kubespawner already, there may be room for improvements in how its implemented but a key implementation detail is that the k8s Secret should be created after the Pod.
  • Use of typical labels for created resources
    Any resources created should have a few labels to indicate who created it. Helm charts creating resources should have labels like these, where its not obvious what we should set when its resources created by software, which in turn was installed by a Helm chart. I'm not 100% on what values we provide, or if we should set the helm.sh/chart label as its indirectly created only, but we should at least provide some labels indicating this origin I think.

@consideRatio
Copy link
Member

My own response to the questions above:

  • Q1: I think allowing for runtime decided git credentials is acceptable to support, that the complexity added is sufficiently motivated, but I'm not fully confident about this.
  • Q2: I think I'd rather see a single flexible approach that then must be found acceptable for everyone, even for those not using this, as that would reduce the complexity of the system as a whole. I'm worried otherwise that this strategy is used by very few, and we need to ask "oh are you using ... or not?" and giving special treatments to this when testing and trying to come to the botton of a reported issue etc.

@minrk you worked to provide the kubespawner implementation dynamically creating k8s Secret resources for internal_ssl, what do you think?

@sgaist
Copy link
Contributor Author

sgaist commented Nov 18, 2022

On the implementation side, the secret can be created before the pod and the ownership applied through patching. I did a small test locally. It can be done as soon as the pod enters the pending state.

I haven't benchmarked it but it would likely reduce the performance hit of having the pod waiting on the secret presence to be properly started.

The downside is that if the pod creation fails the secret needs to be manually cleaned up.

@consideRatio
Copy link
Member

consideRatio commented Nov 18, 2022

The downside is that if the pod creation fails the secret needs to be manually cleaned up.

Yepp, a quite significant downside i think. I'd rather make sure we dont emit credentials we fail to clean up than optimize performance to this degree - based on guessed impact scale.

The git credentials are now stored in a temporary secret following recommended
best practice. The secret will have the same lifespan as the builder pod.
This makes the code simpler and lets k8s encode the data
@sgaist sgaist force-pushed the improve_git_credential_handling branch from d62951e to fc24709 Compare December 5, 2022 16:21
The final implementation comes from the following design decision:
Only create the secret once the pod is ready so we reduce the surface
were sensitive data is managed and avoid its management if the pod fails
for some reason.
@sgaist sgaist force-pushed the improve_git_credential_handling branch from fc24709 to a81abe3 Compare December 5, 2022 20:14
@sgaist
Copy link
Contributor Author

sgaist commented Dec 16, 2022

@consideRatio I was wondering one thing: should the secret creation code be put in its own function ?

@manics
Copy link
Member

manics commented Dec 17, 2022

Note the git credential is also needed by BinderHub before the build pod is created, since BinderHub checks the registry for an existing image so the Git SHA is required:

try:
provider = self.get_provider(provider_prefix, spec=spec)
except Exception as e:
app_log.exception("Failed to get provider for %s", key)
await self.fail(str(e))
return
if provider.is_banned():
await self.emit(
{
"phase": "failed",
"message": f"Sorry, {spec} has been temporarily disabled from launching. Please contact admins for more info!",
}
)
return
repo_url = self.repo_url = provider.get_repo_url()
# labels to apply to build/launch metrics
self.repo_metric_labels = {
"provider": provider.name,
"repo": repo_url,
}
try:
ref = await provider.get_resolved_ref()
except Exception as e:
await self.fail(f"Error resolving ref for {key}: {e}")
return

This may complicated things if implementing dynamic per-user/repo Git credentials. I've hit a similar issue whilst trying to disentangle another property:
#1594

If we go down the route of dynamic docker config it would make sense to keep these two aligned, so if you have any thoughts on that issue please share them!

@manics
Copy link
Member

manics commented Feb 11, 2023

I've been thinking about this because I want to get temporary registry credentials into the build image to push to ECR (push tokens are time-limited so a static secret won't work #1623).

Using secrets is definitely best practice, but on balance I think managing dynamic secrets overcomplicates things for BinderHub due to the clean-up issue- it's a shame you can't atomically create a secret+pod. ownerReferences may help, but there's a circular dependency between the pod and secret and therefore a race condition.

In the worst case the implementation in https://github.com/jupyterhub/kubespawner/blob/0f22abf096459b5424325fa93ff4b30bef568391/kubespawner/spawner.py#L2714-L2732 will lead to N orphaned secrets where N = number of users, whereas in Binderhub the secrets could be per-user-per-repo so there's an unlimited number.

If we want to pursue this I think a compromise of using a Helm chart managed secret for global credentials, and plain env-vars for dynamic credentials, is reasonable. It requires additional logic in the build class to choose between mounting the global secret vs ignoring it and setting an environment variable, but we'll need additional logic to support dynamic credentials anyway. The small risk of using env vars can be mitigated by ensuring any dynamic credentials are time limited- they most likely will be anyway, e.g. OAuth tokens expire, as do ECR registry credentials. You could probably make the case that a dynamic expiring env var is more secure than a one year old static secret 😃

Finally if designed correctly this will be an implementation detail so there's nothing stopping us from switching in future.

@sgaist
Copy link
Contributor Author

sgaist commented Feb 17, 2023

Related to your post from past December (that I was sure I answered to so sorry for the delay):
Where you writing in the context of pushing the image back to the same repository as the code comes from ? Otherwise, it's one unique token that could be stored as a static secret managed by helm.


For your new setup, it sounds a bit like the forge use case. In between, one related thing I have discovered that could be of interest: the Secrets Store CSI driver. Note that I haven't tested it yet.
The goal of this project is to move the secrets into a dedicated strong storage and the pods use a dedicated provider to retrieve them.
There's an AWS provider however I currently don't know if it would be usable for your use case.
While interesting, it adds a new layer of indirection which needs to be evaluated first.


As for the orphaned secrets, I am wondering whether having a cleanup CronJob would help in that case ? Note that the current implementation should properly clean everything in case of failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Under the hood improvements and fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants