Observability platform multi-tenancy #85

marieroque · 2024-02-08T15:08:45Z

towards https://github.com/giantswarm/giantswarm/issues/29750

multi-tenancy/README.md

multi-tenancy-observability/README.md

Co-authored-by: Quentin Bisson <[email protected]>

multi-tenancy-observability/README.md

QuentinBisson · 2024-03-11T14:30:12Z

@giantswarm/sig-architecture this is not a draft anymore so I would love your opinion on the matter.

@puja108 I added the write path if you want to take another look :)

puja108 · 2024-03-11T14:36:55Z

multi-tenancy-observability/README.md

+
+But this approach brings a few questions:
+1. What tenancy should common internal apps use (e.g. prometheus-operator, api-server, cilium logs are accessed by multiple internal teams)? Do we need to ensure customers are putting those apps into a namespace accessible by all?
+2. What about teams that share namespaces?


From what I know from customers, we rarely have that outside of kube-system

It's also the case for the monitoring namespace and it would probably the same tenants that can access the kube-system namespace

puja108 · 2024-03-11T14:38:38Z

multi-tenancy-observability/README.md

+This approach makes it really easy for us and customers to configure tenants and it would definitely be easy to onboard teams (e.g. adding a label on the namespace would make the agents collect logs from the containers in the namespace).
+
+But this approach brings a few questions:
+1. What tenancy should common internal apps use (e.g. prometheus-operator, api-server, cilium logs are accessed by multiple internal teams)? Do we need to ensure customers are putting those apps into a namespace accessible by all?


Not sure I understand the question, if they are separated by namespace=tenant then if someone wants everyone to have access to them they add everyone to that tenant-mapping. In most cases, especially the ones you mentioned above I would doubt everyone would have access to those.

For my discussion with customers, prometheus-operator was moved to another namespace because teams did not have access to kube-system logs :D

So, my main questions in this case is more for our managed apps because customers can install them in namespaces they want and it could also be running with their own workload like it is the case with prometheus-operator. Now we technically have access to all the logs because we manage the clusters but in loki, we could theoretically reduce our access to only kube-system, giantswarm namespaces and managed apps logs only.

Do we want to go in that direction or should Giant Swarm have access to all logs by default?

puja108 · 2024-03-11T14:40:07Z

multi-tenancy-observability/README.md

+- This is far from the idea of a platform
+- There is a high risk that configuration will be messy and hard for use to debug when we need to
+
+This approach might be useful for customers that want to send logs from outside the clusters


can we enable that as an alternative route for the usecase of "outside the cluster" alone?

I do not see why not :)

puja108 · 2024-03-11T14:41:41Z

multi-tenancy-observability/README.md

+- Quite easy to configure for logs
+
+Cons:
+- Tenant configuration will need to be configured in multiple places (MC for the read path and WC for the write path)


Might not be a big problem as read path is access management and MC is a good place for that and write path workload related so deploying together with workload makes sense. The two lifecycles are separated.

puja108 · 2024-03-11T14:45:26Z

multi-tenancy-observability/README.md

+- With this approach, we could still have a `giantswarm.io/tenant: my-tenant` label on some apps that should have a specific tenant attached to them.
+
+Cons:
+- Our existing customers want to start slow (team by team) and opting-out for all namespaces from the start will be a hassle. An idea to avoid this would be to be able to opt-out of the multi-tenancy features with a cluster label (`giantswarm.io/tenancy: false`).


We can find that out with customers. AFAIK a lot of customers have managed namespaces that are controlled by the Platform team and where they should be able to add a default opt-out label on creation as they also add other things.

You could probably roll out slowly (and under shared control) with Policy API. We can discuss

puja108

Thanks, I think I understand much better now (also thanks to your explanations on other RFCs and previous comments). As I kinda suggested it, I'm in favour of approach 4 and I like that there's flexibility on top. Only need to find out a bit more about opting out and how feasible that is.

Main next step would be the read path mapping config we'd want to expose to customers. My main concern there is that we most probably would have a 1:1 there between out o11y permissions to namespaces and WC RBAC permissions to namespaces. Having a config interface (like a CR) makes sense, but we might be able to somehow generate it to reflect WC RBAC. We could sync with Big Mac here to see what we currently offer here in terms of WC permission management. IIRC there's an app we offer, but most customers might be using their own config.

pipo02mix · 2024-03-12T10:13:54Z

multi-tenancy-observability/README.md

+
+What interface could we offer our customers to enable them to define the mapping between their groups and their tenants:
+
+- Custom resources like TenantCR, GroupCR


IMO this is the purpose of Project CR we thought for Dev Platform story

It is related, but I can see us pulling the concept apart so we can "translate" a project CR to specific custom CRs like a o11ytenant CR or sth. Because Project CR should be depicting project/namespace/team and other info, but here for example we only map OIDC group to o11ytenant, so this is more like RBAC binding.

hmm yeah, I guess that decoupling those will give us more flexibility. I am just afraid we may create too much complexity. Said that, tying everything to a single CR could be dangerous

In the beggining, we can have this maintained by us via giant swarm config until we think about the UX side to not block this RFC, but I'm not sure where (should we create a WG around multi-tenancy)?

pipo02mix · 2024-03-12T10:16:11Z

multi-tenancy-observability/README.md

+Here is what the flow looks like:
+1. The logging-operator creates a `password` for the cluster's logging agents to be able to send logs to Loki and configures them (actual configuration of what to scrape and so on) via an extra-config.
+2. The logging-operator adds the new cluster_id/password pair to the proxy configuration
+3. The logging agents collect and send logs to the loki-multi-tenant-proxy which validates the basic authentication sent in the http header and transform it into a tenant information (X-Scope_OrgID)


What in a future case where customer can define multiple tenants for write path, is that possible? does it require multiple instances of promtail and/or grafana agent?

From what we discussed before IIRC we can have multiple tenants , but it would mean some data duplication. But @QuentinBisson can explain it better

Yes exactly, so the tenants can be set in promtail as a label via what they called a pipeline stage so if they want they can set multiple tenant, but that means the data will be stored and queries twice so we would advise against it :)

This is why the idea of setting a tenant per namespace makes sense :)

stone-z

The structure presented here overall makes sense to me and I agree with Atlas's current intent.

I would add:

App-level granularity seems inevitable to me. See my comment below.
To achieve app-level granularity, the theoretically best approach IMO would be to use workload identities in some form. Logging as a platform feature will ship long before identity as a platform feature, but if logging is designed with that future in mind, it would be very powerful.
Shield has very similar use cases for security data. We are also building toward an identity-based model, but are currently working with (essentially) app-level granularity.
For the initial implementation, +1 on a single operator. Individual controllers can be spun out on their own if future needs demand it (Shield is doing the same for Policy API controllers).

Question/opinion - this seems like an enterprise feature of the underlying tools -- do you know of or anticipate any artificial limits from that? e.g. is there a limit to the number of tenant that would actually make per-app tenancy impossible?

stone-z · 2024-03-13T22:14:27Z

multi-tenancy-observability/README.md

+##### App == Tenant
+
+This approach is especially flexible for customers but we do not think this approach makes sense because the mapping configuration would be impossible to maintain and we would probably face technical limits pretty fast


IMO this is the natural end state.

Two valid requests:

"The team that owns namespace X can view logs from any application in that namespace"

"The monitoring team can view prometheus logs from any namespace in any cluster where they run prometheus"

Roughly app-level granularity is inevitable in my mind. It also works well with other potential features (described in the general comment).

It could be but then I do not think it works for all things.

Usually, customers run applications for one team in one namespace with a clear boundary. Sure, prometheus might be a bit different but that's really dependant on the customers organizations.

What I do not like in this is that if you want all the logs of all apps in one namespace, then loki will actually run at least number of apps queries. Sure they're paralellized and so on but that does look a bit too much to me for most use cases.

loki will actually run at least number of apps queries

The actual query implementation is beyond my knowledge 😅 Sure, it might be prohibitively expensive or that this is intentionally nerfed in OSS and reserved for enterprise versions.

BUT to the greatest extent possible, I would double down on the likelihood of needing app-level granularity support. Some other examples:

any type of less-trusted collaborators (externals, students, etc.) who should have access to only certain components inside a namespace

selective access to "system" resources - for example, there are multiple types of users who will need access to coredns or nginx, but they shouldn't have access to everything in the namespace, especially kube-system stuff.

automation - similar to above, but for machine accounts.

If it is possible to change the tenant behavior later, maybe logging-operator could default to tenant-per-ns until the first app-level rule is applied, or create app tenants only for those targeted by allow rules. This probably means the capability would only apply to future logs, which is a tradeoff, but could be a workable middle ground.

Yes, from what I see, we can add levels of granularity on-demand, and we could even do that opt-in for apps, so it's not by default tenanted by app, but there's additional tenants for apps where we see demand for selective access control (e.g. we might at some point pull nginx, or coredns tenants into our default mapping). WDYT?

Yeah maybe so. I'm not sure about the Loki internals or implications for trying to change tenancy later on, I would just assume it wouldn't be possible to do retroactively. For example, if a Loki tenant is for a namespace, a new app-level tenant would have no historical logs for that app because they'd still be associated with the old namespace tenant. Possibly queries would also need to be adapted to account for the moved logs, but I'm not familiar enough to say for sure.

Yes, from what I understood a new tenant is a new and separate datastore, which makes duplication of data a topic in that context, but also the issues you mentioned, you basically won't be able to access historical data previous to tenant creation. Not sure about how that looks with queries, but that I would assume should not change as tenants should not influence data, but you never know, maybe @QuentinBisson knows more.

Yes you can think of it as a datastore and a tenant is only a http header so there is no reason why customers that want to migrate to 1 tenant per app would lose old logs if they keep the old and new tenants together :)

stone-z · 2024-03-13T22:21:21Z

multi-tenancy-observability/README.md

+- Tenant configuration will need to be configured in multiple places (MC for the read path and WC for the write path)
+- Label approach would most likely not work for metrics OOTB (this would require customer to add new labels like metrics port, path and so on).
+
+#### Approach 3: Configure promtail based on the MC tenancy CR


Unless the community provides a better option, some Giant Swarm abstraction is the likely end state for this use case. BUT, whether it is WC or MC level depends on architectural decisions about the Project CR which haven't been made yet.

Assuming a GitOps setup, consider the possible relationships between a GS Project CR and Flux. Does the Project CR live inside an application repository? If so, it would be a WC level resource. If the Project CR is managed separately from the application, it could be an MC-level resource. Those have tradeoffs and to my knowledge they haven't been discussed yet.

So this mechanism could be WC-level (or could have complementary abstractions, like a label-based version of the same Project CR fields).

So the issue I have is that in my mind, a project CR would need need to be in the MC because a project can be cross clusters especially if customers have a cluster per stage as this is often the case.

Sure, maybe the Project CR ends up being MC-scoped, it just hasn't been decided. It definitely needs more wider discussion. My comment here was mostly to consider if there are aspects of logging that need to be under dev control, and if so, what that interface would look like.

We might in any case need a leveled approach where we have project CR on MC level, and some other CR on WC levels (a project needs to be deployed to x WCs). But even that WC level in some cases lives on the MC, when you look at HelmRelease or App CR for example. Other parts of WC level would live within WC and that's most probably unavoidable. Thus, to me it means we can handle these independently. It depends a lot on the architecture of implemenation, which the con below shows. Handling some delegation from an MC level resource to a WC level one is not a big issue and we do that regularly with Apps anyway.

QuentinBisson · 2024-03-14T21:07:54Z

@stone-z
How do you see workload identity having any relation to logging? I would be interested to know why you think they would be related? To me, tenancy is about who has access to the logs so maybe the identity would then be a label in the log but it should not defined a user tenancy model.

Question/opinion - this seems like an enterprise feature of the underlying tools -- do you know of or anticipate any artificial limits from that? e.g. is there a limit to the number of tenant that would actually make per-app tenancy impossible?

I think there is no limits on a technical level (max header size can be configured in most ingress controllers) but I think this would definitely add a lot more load to the read path as loki does at least 1 query per tenant even if we know we query only 1 actual tenant from what I understood of the code and that is still dependant on the duration of logs you want to retrieve as it would most likely retrieve multiple chunks per tenant and the number of read queries would be too much in that case.

Other that that, after some self monologue, I'm starting to envision a bit some of the technicalities of this.

What I would think make sense would be to allow customers to configure the tenancy they want at the cluster level (cluster label for instance) with the allowed tenancy values of:

cluster - all logs apart from gs logs are mapped to 1 tenant. I would think this make sense for small clusters where there is no real use for cross namespace tenancy.
namespace - all logs that are a part of a namespace are mapped to 1 tenant for all clusters (e.g. prod namespace on all clusters would be in the same tenant).
cluster_namespace - all logs that are in 1 namespace in 1 cluster are in a tenant .
project - this would then be dependant on the project crs so it's still unclear

logs from kube-system, giantswarm and I guess kyverno namespaces would anyway be sent to the -giantswarm tenant on our side and platform teams would have access the giantswarm tenant configured

We can definitely allow exceptions in here by creating a label like "application.giantswarm.io/tenant" that could be set on a namespace or a deployment that would override the tenant for specific things or maybe add the PodLogs CR that comes with the grafana agent.

Now I'm a bit more concerned about the following things:

What do we do with managed apps that are not deployed in their own namespace or kube-system. For instance, a lot of our customers install prometheus-operator in the monitoring namespace as the kube-system namespace is inaccessible to teams and they can deploy their own prometheus. But then we can also go even deeper in that questions. In this specific case, I'm not sure what the tenant should be or if we then need to store logs twice.

@giantswarm/team-atlas I would like your thoughs on this

stone-z

How do you see workload identity having any relation to logging? I would be interested to know why you think they would be related? To me, tenancy is about who has access to the logs so maybe the identity would then be a label in the log but it should not defined a user tenancy model.

Having workload identities would make it easier to manage the access rights for human (or machine) users to application logs. Take GS or a customer monitoring team as a potential group of humans. That group of humans should have access to prometheus logs across the fleet. But there are multiple types of prometheus apps and instances hanging out in the fleet. How do we tell them apart in a clear enough way that an operator can configure access rules reliably? It isn't a requirement now though, we just have multiple places to configure things.

Regarding the rest of your larger comment:
I think we're starting to get more on the same page. As long as there is no technical reason it isn't possible, we could let customers choose a tenancy model so that they/we can balance loki performance with the actual RBAC needs. I still think at least some will need app<-->tenant (see my other comment), and that ensures the widest coverage later on, but it could be turned down for customers that don't need it.

I also echoed some of your ideas about shared namespaces in my other comment. It might be useful to discuss all this (identity/tenancy needs, GS Project vs dev absractions) in the next SIG Architecture as well. I have lots of ideas about these at a platform level, but they haven't been refined outside Shield and Atlas is now the first with a real need

stone-z · 2024-03-14T21:23:38Z

multi-tenancy-observability/README.md

+##### App == Tenant
+
+This approach is especially flexible for customers but we do not think this approach makes sense because the mapping configuration would be impossible to maintain and we would probably face technical limits pretty fast


loki will actually run at least number of apps queries

The actual query implementation is beyond my knowledge 😅 Sure, it might be prohibitively expensive or that this is intentionally nerfed in OSS and reserved for enterprise versions.

BUT to the greatest extent possible, I would double down on the likelihood of needing app-level granularity support. Some other examples:

any type of less-trusted collaborators (externals, students, etc.) who should have access to only certain components inside a namespace

selective access to "system" resources - for example, there are multiple types of users who will need access to coredns or nginx, but they shouldn't have access to everything in the namespace, especially kube-system stuff.

automation - similar to above, but for machine accounts.

If it is possible to change the tenant behavior later, maybe logging-operator could default to tenant-per-ns until the first app-level rule is applied, or create app tenants only for those targeted by allow rules. This probably means the capability would only apply to future logs, which is a tradeoff, but could be a workable middle ground.

stone-z · 2024-03-14T21:33:58Z

multi-tenancy-observability/README.md

+- Tenant configuration will need to be configured in multiple places (MC for the read path and WC for the write path)
+- Label approach would most likely not work for metrics OOTB (this would require customer to add new labels like metrics port, path and so on).
+
+#### Approach 3: Configure promtail based on the MC tenancy CR


Sure, maybe the Project CR ends up being MC-scoped, it just hasn't been decided. It definitely needs more wider discussion. My comment here was mostly to consider if there are aspects of logging that need to be under dev control, and if so, what that interface would look like.

stone-z · 2024-03-14T21:35:32Z

multi-tenancy-observability/README.md

+- With this approach, we could still have a `giantswarm.io/tenant: my-tenant` label on some apps that should have a specific tenant attached to them.
+
+Cons:
+- Our existing customers want to start slow (team by team) and opting-out for all namespaces from the start will be a hassle. An idea to avoid this would be to be able to opt-out of the multi-tenancy features with a cluster label (`giantswarm.io/tenancy: false`).


You could probably roll out slowly (and under shared control) with Policy API. We can discuss

QuentinBisson · 2024-03-14T22:08:54Z

Sure let's discuss it on wednesday but we might need to make this an hour long meeting :D

multi-tenancy-observability/README.md

QuantumEnigmaa · 2024-03-18T10:25:34Z

multi-tenancy-observability/README.md

+}
+```
+
+The component `loki-multi-tenant-proxy` which has now been renamed to `grafana-multi-tenant-proxy` should be configured with the mapping between groups of people and tenants


Maybe it could be worth to explain in a short sentence why the name has changed.

multi-tenancy-observability/README.md

Co-authored-by: Zirko <[email protected]>

Co-authored-by: Hervé Nicol <[email protected]>

* add loki multitenancy and data management picture Co-authored-by: Quentin Bisson <[email protected]> --------- Co-authored-by: Herve Nicol <[email protected]> Co-authored-by: Quentin Bisson <[email protected]>

github-actions · 2024-07-19T09:04:47Z

This RFC has had no activity for 100 days. It will be closed in 1 week, unless activity occurs or the label lifecycle/keep is added.

JosephSalisbury · 2024-07-25T12:59:11Z

@marieroque what's the status here?

marieroque · 2024-07-25T14:20:16Z

I would say it's part of an epic: giantswarm/roadmap#3558

New RFC: multi-tenancy

f9d4e80

marieroque requested a review from a team February 8, 2024 15:08

Marie Roque added 8 commits February 8, 2024 16:12

Remove trailing whitespace

05d3055

Fix yaml

3595b02

Add loki multi-tenancy graph

4dd8db6

Adding plan

63bb3c6

WIP

47135cb

WIP

2cc03c4

Update img

79e7596

WIP

b05c7cf

hervenicol reviewed Feb 11, 2024

View reviewed changes

pipo02mix assigned marieroque Feb 12, 2024

mcharriere reviewed Feb 13, 2024

View reviewed changes

multi-tenancy/README.md Outdated Show resolved Hide resolved

Renaming RFC + minor changes

48c247f

QuentinBisson reviewed Feb 13, 2024

View reviewed changes

multi-tenancy-observability/README.md Outdated Show resolved Hide resolved