Add progress status for partition rebalances #140

kyguy · 2024-11-26T07:09:17Z

This proposal introduces a new feature to monitor the progression of an ongoing partition rebalance executed by a Strimzi-managed Cruise Control instance via a KafkaRebalance custom resource. Implementation of this proposal should help to address strimzi/strimzi-kafka-operator#10278

Signed-off-by: Kyle Liberti <[email protected]>

scholzj · 2024-11-26T09:18:49Z

088-rebalance-progress-status

+### Progress Update Cadence
+
+For ease of implementation and minimizing the load on the CruiseControl REST API server, we would only query the CruiseControlState endpoint and update the “progress” section upon `KafkaRebalance` resource reconciliation. 
+The progress section will never be more out of date longer than the reconciliation period and even if the rebalance runs into an error or “NotReady” state, the “progress” section would still be updated on that KafkaRebalance resource reconciliation along with any error.


How do you avoid tight reconciliation loop as update to the status will trigger new reconciliation that will update the status, trigger new reconciliation etc.?

That is a very good point. Maybe we need to post a timestamp of last progress check and if it is less than the reconciliation period then skip?

The general rule is to not include things like that in the status. Using some timestamp for that would probably need to be handled when getting the progress data and not when updating the status, as that is a shared code and it might be complicated to put it there.

The progress section will never be more out of date longer than the reconciliation period this part might not be true always. For example, if CC REST API returned an error for some reason and the executor state could not retrieved, would we wait for the next reconciliation to retry? In which case, the progress section would be out of date.

That is a very good point. Maybe we need to post a timestamp of last progress check and if it is less than the reconciliation period then skip?

The general rule is to not include things like that in the status. Using some timestamp for that would probably need to be handled when getting the progress data and not when updating the status, as that is a shared code and it might be complicated to put it there.

I should be able to use the existing timestamp in metadata.managedFields[].time field of KafkaRebalance resource to know when the resource was last updated, then only update the progress section if that timestamp is older than the reconciliation period.

I don't think you should rely on metadata.managedFields[].time as those are internal Kafka fields with completely different purposes.

If we are not allowed to maintain a timestamp in the progress section specifying when it was last changed or rely on the metadata.managedFields[].time field of the custom resource then we will either have to find another way of tracking when the resource was last updated or try another approach for preventing tight reconciliation loops.

I'll see what I come up with and get back to you.

I'm not sure you cannot have a timestamp in the status. The question is how you work with the timestamp, how you use it, and when/how you update it. But in general, the easiest solution is to store the progress in a config map which you can simply update in very reconciliation and as you don't watch you do not need to b worried about what it triggers. Event might be other option for the progress tracking maybe? I do not like them very much and I think they are pretty useless for tracking the restart events. But if you publish the events to the KafkaRebaance resource, it might be more useful than for Pods.

I'm not sure you cannot have a timestamp in the status. The question is how you work with the timestamp, how you use it, and when/how you update it.

If we were to maintain a timestamp in the status we would add it with the progress section upon initial creation, then update it on the next reconciliation when its value was older than reconciliation period. With a timestamp of when the progress was last changed, we could easily avoid triggering unwanted reconciliations.

But in general, the easiest solution is to store the progress in a config map which you can simply update in very reconciliation and as you don't watch you do not need to b worried about what it triggers.

This is a really interesting idea. TBH I hadn't thought of storing the progress information in a ConfigMap instead of the KafkaRebalance status. We were planning on maintaining a ConfigMap for executor state information anyway and yes, in this way we could avoid triggering reconciliations upon progress updates. We would still need to add a progress section with a reference to the ConfigMap but this would only need to be added/removed once per state change.

Although maintaining the progress information in the ConfigMap would be the simplest solution, I still feel that the UX of maintaining the progress information in the KafkaRebalance status would still be worth the added implementation complexity. Any thoughts on this @ppatierno @tomncooper ?

Event might be other option for the progress tracking maybe? I do not like them very much and I think they are pretty useless for tracking the restart events. But if you publish the events to the KafkaRebaance resource, it might be more useful than for Pods.

I hadn't thought of using events either but I'll think more on this. My only concern for storing the progress in the events would be the UX of getting the progress information, the initial idea was that the progress information would be easily found and read by users in the KafkaRebalance resource.

If we were to maintain a timestamp in the status we would add it with the progress section upon initial creation, then update it on the next reconciliation when its value was older than reconciliation period. With a timestamp of when the progress was last changed, we could easily avoid triggering unwanted reconciliations.

Yes, you would need some custom logic such as if the timestamp is older than X minutes, update the progress. If not, just reuse the old progress.

I hadn't thought of using events either but I'll think more on this. My only concern for storing the progress in the events would be the UX of getting the progress information, the initial idea was that the progress information would be easily found and read by users in the KafkaRebalance resource.

kubectl describe kr should show you the events I think. Also most UIs would normally show the events when you list the custom resource.

Discussed w/ Paolo and Tom, they agreed storing the progress information in the ConfigMap would simplify the implementation and that doing so wouldn't significantly change the UX. Given the executor state information is already going to be stored in the ConfigMap it probably makes the most sense to maintain our progress information there as well. Let me update the proposal to show what it would look like

tomncooper

Had a first past. A lot of my comments are optional style/grammar/formatting suggestions, so feel free to ignore them.

My main comments are:

@scholzj makes a very good point about avoiding infinite reconciliation after a status update. You will need to solve that.
I think we should include a minimum estimated time for optimization proposals. Even if it is a ball park figure it is very useful guide. But lets see what others think.

tomncooper · 2024-11-26T11:55:58Z

088-rebalance-progress-status

Nit: Can you add the .md suffix so GH can apply the right syntax highlighting.

tomncooper · 2024-11-26T12:01:06Z

088-rebalance-progress-status

+
+In this “progress” section, we include the following fields:
+
+- estimatedTimeToCompletion: The minimum estimated amount time it will take in minutes until partition rebalance is complete. 


Judging from the formula used from this, this value is a prediction based on the past average data transfer rate. The rate could increase in future, so this estimation is not a minimum.

tomncooper · 2024-11-26T12:01:39Z

088-rebalance-progress-status

+
+### Supported KafkaRebalance States
+
+For initial implementation we will focus on including the “progress” section only in the following KafkaRebalance states:


Suggested change

For initial implementation we will focus on including the “progress” section only in the following KafkaRebalance states:

For the initial implementation, we will focus on including the “progress” section only in the following KafkaRebalance states:

tomncooper · 2024-11-26T12:02:38Z

088-rebalance-progress-status

+
+helps users understand the cost of an ongoing partition rebalance, decide whether or not they should continue or cancel it, and know when future operations will be able to be safely executed.
+
+Further, having this information readily available and easily accessible via `KafkaRebalance` custom resources allows users and third-party tools like the Kubernetes CLI or Strimzi Console to easily track the progression of a partition rebalance. 


Suggested change

Further, having this information readily available and easily accessible via `KafkaRebalance` custom resources allows users and third-party tools like the Kubernetes CLI or Strimzi Console to easily track the progression of a partition rebalance.

Further, having this information readily available and easily accessible via `KafkaRebalance` custom resources, allows users and third-party tools like the Kubernetes CLI or Strimzi Console to easily track the progression of a partition rebalance.

tomncooper · 2024-11-26T12:03:50Z

088-rebalance-progress-status

+- How much time an ongoing partition rebalance has left to take
+- How much data an ongoing partition rebalance has left to transfer 
+
+helps users understand the cost of an ongoing partition rebalance, decide whether or not they should continue or cancel it, and know when future operations will be able to be safely executed.


Style Nit: I am not sure the bullet points make things clearer? Feels a bit disjointed when you read it. Maybe just make this a single sentence?

tomncooper · 2024-11-26T12:39:01Z

088-rebalance-progress-status

+
+#### Adding “progress” section for other KafkaRebalance states
+
+In addition to the “progress” the “Rebalancing” and “Stopped” KafkaRebalance states, we could provide the “progress” section for other states as well such as the “ProposalReady” and “Ready” states. 


I have previously suggested putting these states in code quotes (``), but double-quotes is fine too, so long as they are consistent throughout the doc.

tomncooper · 2024-11-26T12:39:51Z

088-rebalance-progress-status

+
+In addition to the “progress” the “Rebalancing” and “Stopped” KafkaRebalance states, we could provide the “progress” section for other states as well such as the “ProposalReady” and “Ready” states. 
+Firstly, this would help emphasize that a rebalance had not started or had completed by having a percentageComplete: 0% on "ProposalReady" and a percentageComplete: 100% on "Ready". 
+This emphasis could help clear up ambiguity surrounding what the KafkaRebalance “Ready” state or “optimizationResult” field means. 


That would be nice.

tomncooper · 2024-11-26T12:40:32Z

088-rebalance-progress-status

+This feature would be of great value to users. 
+However, providing an accurate estimation for this is non-trivial, namely the “estimatedTimeToCompletion” field for “ProposalReady" state, is non-trivial. 
+
+Leveraging the Cruise Control configurations and user-provided network capacity settings, we could provide a rough estimate for “estimatedTimeToCompletetion” field for inter-broker balances. 


Suggested change

Leveraging the Cruise Control configurations and user-provided network capacity settings, we could provide a rough estimate for “estimatedTimeToCompletetion” field for inter-broker balances.

Leveraging the Cruise Control configurations and user-provided network capacity settings, we could provide a rough estimate for “estimatedTimeToCompletetion” field for inter-broker movements.

tomncooper · 2024-11-26T12:42:37Z

088-rebalance-progress-status

+# The maximum number of partition movements given CC partition movement cap
+max_partition_movements= min(<# of brokers> *  
+    num.concurrent.partition.movements.per.broker) 
+max_partition_movements=min(max_partition_movements, max.num.cluster.partition.movements)


Why are these two variables called the same thing?

I was wondering the same thing, like are we overwriting the previous value? I think a different name would be better.

Fixed with the reformatting

tomncooper · 2024-11-26T12:45:13Z

088-rebalance-progress-status

+estimatedTimeToCompletion = intraBrokerDataToMoveMB / throughput
+```
+
+Given that its inclusion is not completely necessary and adds significant complexity to the proposal, it is out of scope for this proposal.


We could just ignore intra-broker movements and just base the estimate on inter-broker movements (they will take up the bulk of the time anyway). We could document it as a theoretical minimum and state that it will take longer than this. But it would give a ball park estimate, which is better than the current situation.

We could just ignore intra-broker movements and just base the estimate on inter-broker movements (they will take up the bulk of the time anyway).

Assuming the disk throughput is always faster than the network throughput!

We could document it as a theoretical minimum and state that it will take longer than this. But it would give a ball park estimate, which is better than the current situation.

Let me think more on this

Still looking into this, investigating possible alternatives and hacking it in the prototype to gauge how complicated it would be to implement. If it isn't too complicated, I'll add it into this proposal and we can aim for supporting all KafkaRebalance states in one go

fvaleri

Thanks @kyguy, this seems to be useful.

I left few comments for your consideration. Please, also fix formatting.

fvaleri · 2024-12-02T11:31:03Z

088-rebalance-progress-status

+[1] The “progress” section will be visible during the KafkaRebalance “Rebalancing” and  “Stopped” states.
+[2] The minimum estimated time it will take the rebalance to complete.
+[3] The percentage complete of the ongoing rebalance in the range [0-100]%
+[4] The ConfigMap where “non-verbose” JSON payload from Executor State from CruiseControlState endpoint is stored.


Why do we need to store the state in a config map? Maybe we could simply document how to recover that from the REST endpoint in case it is needed for troubleshooting.

fvaleri · 2024-12-02T11:32:30Z

088-rebalance-progress-status

+##### Rebalancing
+
+```
+rate = (finishedDataMovement)/(<task_trigger_time> - <current_time>)


In order to avoid mistakes, we should always specify the unit in the variable's name (e.g. finishedDataMovementMB).

fvaleri · 2024-12-02T11:36:54Z

088-rebalance-progress-status

+When querying the Executor State of the CruiseControlState endpoint directly, we have the option to add a “verbose” parameter to request additional information surrounding the state. 
+The additional information could be of interest to third-party UI tools for exposing more details of a rebalance or to users debugging a problematic rebalance at the partition level. 
+However, to reduce the complexity of this initial enhancement, we have chosen not to use the “verbose” parameter. 
+One concern is that some of the fields like the “pendingParitionMovements” field can cause the JSON output to grow quite large. 


Suggested change

One concern is that some of the fields like the “pendingParitionMovements” field can cause the JSON output to grow quite large.

One concern is that some of the fields like the “pendingPartitionMovements” field can cause the JSON output to grow quite large.

katheris

Generally the proposal looks good to me. I agree with the comments from others and just had one comment about the field name of percentageComplete and a suggestion for an additional field we could include

katheris · 2024-12-02T14:47:51Z

088-rebalance-progress-status

+	provisionRecommendation: ""
+	provisionStatus: RIGHT_SIZED
+	recentWindows: 1
+  progress:


Suggested change

progress:

progress: [1]

katheris · 2024-12-02T14:59:10Z

088-rebalance-progress-status

+In this “progress” section, we include the following fields:
+
+- estimatedTimeToCompletion: The minimum estimated amount time it will take in minutes until partition rebalance is complete. 
+- percentageComplete: The percentage of the partition rebalance that is completed e.g. values in the range [0-100]%


Based on the calculations listed below I wonder if we should be explicit that this the percentage based on the data movement, rather than percentage of partitions done. We could also consider adding a separate field for percentagePartitionMovementComplete, but depends if that would be interesting to people or not.

Suggested change

- percentageComplete: The percentage of the partition rebalance that is completed e.g. values in the range [0-100]%

- percentageDataMovementComplete: The percentage of the partition rebalance that is completed e.g. values in the range [0-100]%

ppatierno · 2024-12-09T08:12:02Z

088-rebalance-progress-status

+
+- estimatedTimeToCompletion: The minimum estimated amount time it will take in minutes until partition rebalance is complete. 
+- percentageComplete: The percentage of the partition rebalance that is completed e.g. values in the range [0-100]%
+- rebalanceProgressConfigMap: The ConfigMap where “non-verbose” JSON payload from Executor State from CruiseControlState endpoint is stored.


I would not reference the internal class CruiseControlState representing such endpoint but more what's the real user facing REST endpoint, so /kafkacruisecontrol/state?substates=executor

ppatierno · 2024-12-09T08:15:48Z

088-rebalance-progress-status

+We could provide the “progress” section for other states as well such as the “ProposalReady” and “Ready” states but it is not completely necessary, nor is it trivial. 
+Further explanation as to why that is and why it should be saved as a future improvement is explained in the Future Improvements section near the bottom of this proposal.
+
+All information required for estimating the values of  “estimatedTimeToCompletion” and  “percentageComplete” fields can be derived from either Cruise Control server configurations or CruiseControlState endpoint. 


again let's refer to the user facing REST endpoint not the CruiseControlState class.

ppatierno · 2024-12-09T08:24:46Z

088-rebalance-progress-status

+Further explanation as to why that is and why it should be saved as a future improvement is explained in the Future Improvements section near the bottom of this proposal.
+
+All information required for estimating the values of  “estimatedTimeToCompletion” and  “percentageComplete” fields can be derived from either Cruise Control server configurations or CruiseControlState endpoint. 
+That being said, the method of estimation for these fields depends on the state of the KafkaRebalance resource.


Maybe a language issue on my side but what do you mean by the "method of estimation ... depends on the state ..."?

ppatierno · 2024-12-09T08:59:18Z

088-rebalance-progress-status

+##### Stopped
+
+Once a rebalance has been stopped, it cannot be completed. 
+Therefore, there is no “estimationTimeToCompletion” for a stopped rebalance, so we set estimatedTimeToCompletion = null to emphasize this. 


What does it mean we set estimatedTimeToCompletion = null in terms of custom resource? Do you really want something like estimatedTimeToCompletion: null? Maybe N/A or just removing the field? @tomncooper wdyt?

ppatierno · 2024-12-09T09:01:00Z

088-rebalance-progress-status

+
+#### rebalanceProgressConfigMap
+
+Will only be present in “Rebalancing” and “Stopped” states.


Does it mean that the ConfigMap is deleted when the rebalance is in the other states? Will this field be just removed from the progress as well? should we make it clearer?

Added a line to make this clearer

ppatierno · 2024-12-09T09:05:39Z

088-rebalance-progress-status

+[1] The “progress” section will be visible during the KafkaRebalance “Rebalancing” and  “Stopped” states.
+[2] The minimum estimated time it will take the rebalance to complete.
+[3] The percentage complete of the ongoing rebalance in the range [0-100]%
+[4] The ConfigMap where “non-verbose” JSON payload from Executor State from CruiseControlState endpoint is stored.


@fvaleri which kind of troubleshooting are you talking about? The verbose information stored in the ConfigMap are coming from the state?substates=executor endpoint which has data when something is running, otherwise it just returns a NO_TASK_IN_PROGRESS so in case of issues, you can't get anything interesting from here AFAIK.

ppatierno · 2024-12-09T09:22:15Z

088-rebalance-progress-status

+
+The “non-verbose” JSON payload from the ExecutorState is already too verbose to include in the `KafkaRebalance` status in its entirety. 
+However, having the information available to users is still useful especially when debugging the state of a partition rebalance. 
+Therefore, we will store the JSON payload in its own ConfigMap, “rebalanceProgressConfigMap”. 


We should also define the name of such ConfigMap. It's not configurable and user cannot decided this name. I think it will be something pre-formatted starting from the KafkaRebalance name?

ppatierno · 2024-12-09T09:27:46Z

088-rebalance-progress-status

+
+Given that its inclusion is not completely necessary and adds significant complexity to the proposal, it is out of scope for this proposal.
+
+#### Configurable verbosity for Executor State


I am not sure we want to mention this and also having it as a future improvement. Today, we cannot specify verbose when getting the proposal as well. It's not exposed to the user.

Are we worried that user's might ask for it if it is included in the proposal? Would we ever want to provide the verbose optimization proposal to the user in the future?

I don't know to both questions. We can anyway create a new proposal at some point for the verbosity configuration if someone will come to us and ask for that. I would just avoid to make commitment for the future right now. @tomncooper wdyt?

What if we moved it to the "Rejected Alternatives" section? This way we avoid the commitment and we have the reasons why it was rejected documented there in case users ask for it in the future.

TBH I don't mind stripping the section out completely if we are worried keeping it will result in user fixation or confusion!

I don't think it would be a rejected alternatives, but just an addition. I am for removing this section.

Signed-off-by: Kyle Liberti <[email protected]>

scholzj · 2024-12-18T16:12:27Z

088-rebalance-progress-status.md

+
+## Proposal
+
+This proposal extends the status section of the `KafkaRebalance` custom resource to include a `progress` section with a nested `rebalanceProgressConfigMap` field that references a `ConfigMap` that contains information related to an ongoing partition rebalance.


Do we need a new CM? Can't we use the existing one from the proposal?

We were considering using the existing ConfigMap, "afterBeforeLoadConfigMap" to store this progress information, but were concerned the additional data would contribute to hitting the 1 MB ConfigMap limit sooner. It is not so much of an issue for the constant "non-verbose" executor state information we plan on providing as part of this proposal. However, if we were to extend the feature to provide the variable "verbose" executor state information in the future, it would increase the chance of hitting the limit for larger production clusters that have a larger number of brokers and partitions.

If we have no requests/plans for providing "verbose" executor state information in the future, I don't see much of a problem of storing the information in the existing ConfigMap. At the least, it would simplify the proposal implementation . Any thoughts @ppatierno @tomncooper?

As mentioned before, I am not sure we'll never have the support for "verbose" so maybe we could think about the present and not the future. From this perspective, using the same ConfigMap seems to be reasonable.
Anyway you can keep the rebalanceProgressConfigMap field pointing to that ConfigMap.
When/if one day we have support for "verbose", that field will just point to a different ConfigMap. It should not be a big issue for users.

You don't have to use the same CM just because I asked about it.

No I just think it's a good compromise. Anyway let's see what @kyguy @tomncooper think?

Discussed offline with Paolo and Tom, since the progress information is constant, we can safely add it to the existing ConfigMap maintained for and tied to the KafkadRebalance resource. This keeps KafkaRebalance information organized in one place, simplifies the proposal implementation, and has insignificant impact on the storage of the ConfigMap. Refactored and added this note to the proposal.

scholzj · 2024-12-18T16:16:10Z

088-rebalance-progress-status.md

+  - lastTransitionTime: "2024-11-05T15:28:23.995129903Z"
+    status: "True"
+    type: Rebalancing
+    message: "Failed to retrieve rebalance progress"   


This should be probably a separate warning condition and not be part of the rebalancing condition as it is not clear what it means really?

Yes, you are right, just updated this to a "Warning condition"

scholzj · 2024-12-18T16:21:09Z

088-rebalance-progress-status.md

+    rebalanceProgressConfigMap: my-rebalance-progress
+```
+
+### Future Improvements


Is this chapter empty? Or do you need to fix the headers from here down?

ppatierno · 2024-12-19T14:37:57Z

088-rebalance-progress-status.md

+We could provide this information for other states as well, such as the `ProposalReady` and `Ready` states, but it is not completely necessary, nor is it trivial. 
+Further discussion on the inclusion of the progress information for these other states can be found in the [Future Improvements](#future-improvements) section near the bottom of this proposal.
+
+All the information required for estimating the values of `estimatedTimeToCompletion` and `percentageDataMovementComplete` fields can be derived from either the Cruise Control server configurations or the [/kafkacruisecontrol/state?substates=executor](https://github.com/linkedin/cruise-control/wiki/REST-APIs#query-the-state-of-cruise-control) REST API endpoint.


"can be derived from either the Cruise Control server configurations or the REST API endpoint." ... this sounds like we have a choice from where estimating the values, while we should be clear in the proposal which one we are using, which I guess is REST API endpoint, right?

Updated "'or" -> "and" now that the ProposalReady state is being implemented as part of the proposal. This statement now makes more sense since the ProposalReady state estimations depend on the information from the Cruise Control server configurations, while the other states depend on the information from the /kafkacruisecontrol/state?substates=executor.

ppatierno · 2024-12-19T14:41:07Z

088-rebalance-progress-status.md

+
+$$
+\text{executorState} = \langle \text{Previous JSON payload from "/kafkacruisecontrol/state?substates=executor" endpoint} \rangle
+$$


what's the value of showing the above formulas? I mean we are just saying the executorState field will contain the above JSON returned by the state endpoint. I think we already mentioned it a few times, or?

ppatierno · 2024-12-19T14:44:54Z

088-rebalance-progress-status.md

+
+it is best if we maintain the progress information somewhere else. 
+
+#### Including “ExecutorState” in “afterBeforeLoadConfigmap”


if we go with using the same CM we have to remove the section from here.

tomncooper · 2024-12-19T14:57:13Z

088-rebalance-progress-status.md

+- `Stopped`
+
+These are the states where this progress information will be able to be most accurately calculated and most useful for users. 
+We could provide this information for other states as well, such as the `ProposalReady` and `Ready` states, but it is not completely necessary, nor is it trivial. 


I still argue that having a lower bound for the estimated time of a KafkaRebalance in the ProposalReady state, based on the proposed data to move and the throttle settings would be a useful thing to have. It would obviously not be accurate as there is not disk throughput available.

But it is still a useful guide and helps users gauge the impact of a proposed rebalance, which data-to-move values alone don't give.

tomncooper · 2024-12-19T15:00:16Z

088-rebalance-progress-status.md

+$$
+
+Notes
+- [1] `finishedDataMovement` is the number of megabytes already moved by rebalance, provided by [/kafkacruisecontrol/state?substates=executor](#field-executorstate) REST API endpoint.


You don't need the numerical indicators ([1]) in the formulas. You state the full name of the variable anyway so they don't help.

fvaleri · 2024-12-20T10:46:36Z

088-rebalance-progress-status.md

+Knowing things like how much time an ongoing partition rebalance has left to take and how much data an ongoing partition rebalance has left to transfer helps users understand the cost of an ongoing partition rebalance.
+This information helps users decide whether they should continue or cancel an ongoing rebalance, and know when future operations will be able to be safely executed.
+
+Further, having this information readily available and easily accessible via Kubernetes primitives, allows users and third-party tools like the Kubernetes CLI or Strimzi Console to easily track the progression of a partition rebalance.


Can you add a simple example of how the Kubernetes CLI could be used to get the Rebalance progress information?

"Strimzi Console"? I think you mean the StreamsHub Console.

fvaleri · 2024-12-20T11:09:20Z

088-rebalance-progress-status.md

+[2] The `ConfigMap` containing information related to the ongoing partition rebalance, generated with the name "<kafka_rebalance_resource_name>-progress".
+
+In the `ConfigMap`, we will include the following fields:
+- **estimatedTimeToCompletion**: The estimated amount time it will take in minutes until partition rebalance is complete.


Like in optimizationResult, I think that it would be good to add a unit suffix to this key, i.e. estimatedCompletionTimeInMinutes. IMO, we should follow this pattern in general, and avoid using unit suffixes in values. This would also make the formulas self explanatory.

fvaleri · 2024-12-20T11:15:19Z

088-rebalance-progress-status.md

+
+In the `ConfigMap`, we will include the following fields:
+- **estimatedTimeToCompletion**: The estimated amount time it will take in minutes until partition rebalance is complete.
+- **percentageDataMovementComplete**: The percentage of the data movement of the partition rebalance that is completed e.g. values in the range [0-100]%


Should this be named completedDataMovementPercentage, similar to monitoredPartitionsPercentage in optimizationResult?

fvaleri · 2024-12-20T11:29:10Z

088-rebalance-progress-status.md

+   "triggeredUserTaskId":"0230d401-6a36-430e-9858-fac8f2edde93"
+  }
+```
+[1] The estimated time it will take the rebalance to complete based on the average rate of data transfer.


Which time unit? There are other places in which the time unit is not defined.

Signed-off-by: Kyle Liberti <[email protected]>

scholzj · 2024-12-23T20:20:11Z

088-rebalance-progress-status.md

+  estimatedTimeToCompletionInMinutes: 5m [1]
+  completedDataMovementPercentage: 80% [2]


You should either have the unit in the value and have users parse it or have it in the name and use an integer / double only. I'm fine with both ways, but you should pick one.

My view is ...
For the completedDataMovementPercentage field it doesn't make much sense to have the % symbol in the value so I would be to just remove it and having completedDataMovementPercentage: 80.
Regarding the estimatedTimeToCompletionInMinutes, it could depends if we want the flexibility of showing the value in a different unit, but I don't see any value in it. I mean we could have estimatedTimeToCompletion: 300000ms or estimatedTimeToCompletion: 5m to say the same but does it make really sense? for this reason I would be more for just estimatedTimeToCompletionInMinutes: 5. The rebalancing is a long process and showing 1 minute or just 0 minute for a remaining time which is less than a minute could make sense (instead of something like estimatedTimeToCompletion: 36s).

scholzj · 2024-12-23T20:22:10Z

088-rebalance-progress-status.md

+- `ProposalReady`
+- `Rebalancing`
+- `Stopped`


What will be the actual values in these states?

Proposal ready -> 0% completion and the estimated time from once it would be approved?

Rebalancing -> an up to date information?

Stopped -> the last infor before it was stopped?

Ready -> 100% and 0 minutes remaining?

Maybe you can describe it here in bullet points in a human readable form and leave the formulas below for experts.

I think a summary like this would be useful

scholzj · 2024-12-23T20:22:46Z

088-rebalance-progress-status.md

+  name: my-rebalance
+  …
+data:
+  estimatedTimeToCompletionInMinutes: 5m [1]


How is the estimation counted? Is it reliable? How much is it affected by the issues with unknown real network capacity?

How is the estimation counted?

Depends on the KafkaRebalance state, the specific details per state are in the "Field: estimatedTimeToCompletionInMinutes" section of the proposal.

Is it reliable?

In general, yes. The value for the Stopped and Ready states are hardcoded and the value for the Rebalancing state is based on the average rate of data transfer and easily calculated without the need of any user or capacity settings. The only state that could be potentially problematic is the estimation for the ProposalReady state which relies on accurate network capacity configuration from the user.

How much is it affected by the issues with unknown real network capacity?

If the default or user-configured network capacity is largely different from the real network capacity, the estimation for the ProposalReady state could be inaccurate. If the real network capacity is underestimated, the rebalance could take much less time than than estimatedTimeToCompletionInMinutes to complete. If the real network capacity is overestimated, the rebalance could take much more time than the estimatedTimeToCompletionInMinutes to complete. The latter case wouldn't be as much of an issue as we advertise estimatedTimeToCompletionInMinutes to be a theoretical minimum estimation in the ProposalReady state. However, the former case would be an issue since the estimatedTimeToCompletionInMinutes value wouldn't be a theoretical minimum.

To avoid issues like this, the current plan is to document the users must provide accurate network capacity settings to have accurate estimatedTimeToCompletionInMinutes values in the ProposalReady state. We already documented that users must provide accurate network capacity settings to have accurate rebalances based on network capacity and distribution anyway.

To avoid issues like this, the current plan is to document the users must provide accurate network capacity settings to have accurate estimatedTimeToCompletionInMinutes values in the ProposalReady state. We already documented that users must provide accurate network capacity settings to have accurate rebalances based on network capacity and distribution anyway.

I'm not sure this is a real solution. Do you really believe they configure the accurate network capacity? Do we even know how would they find out the accurate network capacity? Or will we solve it on paper but 99% or users will have it miscofigured and these numbers will be useless?

Do you really believe they configure the accurate network capacity?

Users that are serious about their network resource usage and distribution do

Do we even know how would they find out the accurate network capacity?

I imagine they would use K8s CNI plugins or network performance benchmark tools

Or will we solve it on paper but 99% or users will have it configured and these numbers will be useless?
I'm not sure this is a real solution.

For users that have network capacity properly configured, this feature is still useful. I admit that I don't know how many Strimzi users configure their network capacity settings but I would like to believe that those that are doing it are doing it accurately. In addition to the network capacity documentation, what if we were to only include this estimation in the ProposalReady state for users that explicitly configured their network capacity settings? Would that be a more reasonable solution?

In addition to the network capacity documentation, what if we were to only include this estimation in the ProposalReady state for users that explicitly configured their network capacity settings? Would that be a more reasonable solution?

I guess it could be a viable solution. If the user is setting the network capacity I would assume they know what to put there, if not and they put a wrong/bad value, they should know that it's going to screw up the estimation. The documentation should state that. If we think it's not a viable solution then we should remove the estimatedTimeToCompletionInMinutes from the overall proposal. But I am for taking it and documenting it properly.

@tomncooper wdyt about the above discussion?

IMO PropsalReady estimation is useful, because what really matters to users is to know if it would take minutes, hours, or days (see Windows file copy). If we could compute the average bandwidth from Kafka metrics, then we could use this value to provide a more accurate estimation independently from the user configuration.

scholzj · 2024-12-23T20:25:53Z

088-rebalance-progress-status.md

+
+The estimated time it will take in minutes for a rebalance to complete based on the average rate of data transfer.
+
+The formulas used to calculate field value per `KafkaRebalance` state:


So the formulas here ... will the CO calculate these? Or does CC calculate this and we just show the numbers?

It's the CO calculating these.

Added a sentence in the section above this sentence to make this more clear

"All the information required for the Cluster Operator to estimate the values of estimatedTimeToCompletionInMinutes and completedDataMovementPercentage fields"

PaulRMellor

The proposal clearly outlines the approach for providing progress status and how the information necessary for the calculations will be provided. I’ve left a couple of questions for clarification and some minor suggestions for consistency.

PaulRMellor · 2025-01-02T11:49:36Z

088-rebalance-progress-status.md

+  progress: [1]
+    rebalanceProgressConfigMap: my-rebalance [2]
+```
+[1] The `progress` section will be visible during the `Ready`, `Rebalancing`, `Stopped` and `Ready` states.


Why Stopped but not PausedReconciliation or Not Ready? should we explain here?

PausedReconciliation is not a valid rebalancing state.

Why Stopped but not PausedReconciliation or Not Ready? should we explain here?

The PausedReconciliation and NotReady states are not related to the rebalance operation but more to the proposal genration. Therefore, these states don't have any rebalance progress information associated with them

The PausedReconciliation and NotReady states are not related to the rebalance operation but more to the proposal genration.

Well, actually even during a rebalancing you can get errors from CC and the KafkaRebalance ends in the NotReady state, right? But PausedReconciliation is not a valid rebalancing state at all.

Well, actually even during a rebalancing you can get errors from CC and the KafkaRebalance ends in the NotReady state, right?

Ah, yes you are right, any configuration or rebalance errors will put KafkaRebalance resource in NotReady state. Sorry @PaulRMellor, I was incorrect, NotReady should be supported as well for the same reason the Stopped state is supported, to show how far the rebalance got before it failed. That was a nice spot!

But PausedReconciliation is not a valid rebalancing state at all.

Is that because it is related to the resource and not the rebalance itself? What determines whether it is a valid rebalancing state? I am confused because there is an enum for PausedReconciliation listed in the KafkaRebalanceState class [1]

[1] https://github.com/strimzi/strimzi-kafka-operator/blob/0.45.0/api/src/main/java/io/strimzi/api/kafka/model/rebalance/KafkaRebalanceState.java#L77

Is that because it is related to the resource and not the rebalance itself? What determines whether it is a valid rebalancing state? I am confused because there is an enum for PausedReconciliation listed in the KafkaRebalanceState class [1]

But it's ReconciliationPaused not PausedReconciliation! :-P
Joking apart ... I was trying to defend myself because I totally missed this state in the rebalance FSM :-D

I didn't know about it either until Paul mentioned it!

PaulRMellor · 2025-01-02T11:49:49Z

088-rebalance-progress-status.md

+```
+[1] The `progress` section will be visible during the `Ready`, `Rebalancing`, `Stopped` and `Ready` states.
+
+[2] The `ConfigMap` containing information related to the ongoing partition rebalance


Suggested change

[2] The `ConfigMap` containing information related to the ongoing partition rebalance

[2] The `ConfigMap` containing information related to the ongoing partition rebalance.

PaulRMellor · 2025-01-02T12:03:36Z

088-rebalance-progress-status.md

+In the `ConfigMap`, we will add the following fields:
+- **estimatedTimeToCompletionInMinutes**: The estimated amount time it will take in minutes until partition rebalance is complete.
+- **completedDataMovementPercentage**: The percentage of the data movement of the partition rebalance that is completed e.g. values in the range [0-100]%
+- **executorState**: The “non-verbose” JSON payload from the `/kafkacruisecontrol/state?substates=executor` endpoint.


Maybe we should be more specific about the contents of the executorState field?

I would not provide a detailed list of everything, maybe just a link to the OpenAPI definition of it on the Cruise Control repo?

Yeah, I agree you should give a brief summary of what this is and link to the OpenAPI def upstream.

Added the link and text suggested by Paul which should be fair compromise

PaulRMellor · 2025-01-02T12:14:35Z

088-rebalance-progress-status.md

+In the `ConfigMap`, we will add the following fields:
+- **estimatedTimeToCompletionInMinutes**: The estimated amount time it will take in minutes until partition rebalance is complete.
+- **completedDataMovementPercentage**: The percentage of the data movement of the partition rebalance that is completed e.g. values in the range [0-100]%
+- **executorState**: The “non-verbose” JSON payload from the `/kafkacruisecontrol/state?substates=executor` endpoint.


Suggested change

- **executorState**: The “non-verbose” JSON payload from the `/kafkacruisecontrol/state?substates=executor` endpoint.

- **executorState**: The “non-verbose” JSON payload from the` /kafkacruisecontrol/state?substates=executor` endpoint, providing details about the executor's current status, including partition movement progress, concurrency limits, and total data to move.

PaulRMellor · 2025-01-02T12:15:42Z

088-rebalance-progress-status.md

+```
+[1] The estimated time it will take in minutes for the rebalance to complete based on the average rate of data transfer.
+
+[2] The percentage complete of the ongoing rebalance in the range [0-100]%


Suggested change

[2] The percentage complete of the ongoing rebalance in the range [0-100]%

[2] The percentage complete of the ongoing rebalance in the range [0-100]%.

PaulRMellor · 2025-01-02T14:05:06Z

088-rebalance-progress-status.md

+
+The percentage of the data movement of the partition rebalance that is completed.
+
+The formulas used to calculate field value per  `KafkaRebalance` state:


Suggested change

The formulas used to calculate field value per `KafkaRebalance` state:

The formulas used to calculate the field value differ for each applicable `KafkaRebalance` state:

PaulRMellor · 2025-01-02T14:23:18Z

088-rebalance-progress-status.md

+  progress:
+    rebalanceProgressConfigMap: my-rebalance-progress
+```
+[1] Error message from failed Cruise Control REST API call


Suggested change

[1] Error message from failed Cruise Control REST API call

[1] Error message from failed Cruise Control REST API call.

PaulRMellor · 2025-01-02T14:28:02Z

088-rebalance-progress-status.md

+  conditions:
+  - lastTransitionTime: "2024-11-05T15:28:23.995129903Z"
+    status: "True"
+    type: Warning


Should we call out this property to highlight and distinguish between NotReady and (new?) Warning?

The type of condition in the status could be any one of the KafkaRebalance states, it could also be Warning. Why would the NotReady type be tied/associated to the Warning type?

PaulRMellor · 2025-01-02T14:28:58Z

088-rebalance-progress-status.md

+### Accessing progress fields using Kubernetes CLI
+
+The progress information will be stored in a `ConfigMap` with the same name as the `KafkaRebalance` resource.
+Using the name of the ConfigMap we can view its data from the command line using the Kubernetes CLI.


Suggested change

Using the name of the ConfigMap we can view its data from the command line using the Kubernetes CLI.

Using the name of the `ConfigMap`, we can view its data from the command line using the Kubernetes CLI.

PaulRMellor · 2025-01-02T14:30:19Z

088-rebalance-progress-status.md

+
+### Rejected Alternatives
+
+#### Maintaining progress fields in KafkaRebalance resource status


Suggested change

#### Maintaining progress fields in KafkaRebalance resource status

#### Maintaining progress fields in `KafkaRebalance` resource status

ppatierno

@kyguy I had a pass leaving some comments.
I think this proposal is missing the usual "Affected/not affected projects" section.

ppatierno · 2025-01-07T09:20:26Z

088-rebalance-progress-status.md

@@ -0,0 +1,374 @@
+# Partition Rebalance Progress Status


Should be more "Cluster rebalance progress status" (or rebalancing) ... not sure if "partition" (even using the singular) sounds really fine because it's about a cluster rebalancing and about moving one or more partitions across the cluster.

Updated to the "Adding progress updates for Cruise Control rebalances"

ppatierno · 2025-01-07T09:24:23Z

088-rebalance-progress-status.md

+Knowing things like how much time an ongoing partition rebalance has left to take and how much data an ongoing partition rebalance has left to transfer helps users understand the cost of an ongoing partition rebalance.
+This information helps users decide whether they should continue or cancel an ongoing rebalance, and know when future operations will be able to be safely executed.
+
+Further, having this information readily available and easily accessible via Kubernetes primitives, allows users and third-party tools like the Kubernetes CLI or StreamsHub Console to easily track the progression of a partition rebalance.


How many people in the Strimzi community would know about the "StreamsHub Console"? Maybe it deserves a link to the repo or not mentioning it at all?

ppatierno · 2025-01-07T09:26:01Z

088-rebalance-progress-status.md

+## Proposal
+
+This proposal extends the status section of the `KafkaRebalance` custom resource to include a `progress` section with a nested `rebalanceProgressConfigMap` field.
+This field will reference the `KafkaRebalance`'s existing `ConfigMap`, which will be enhanced to contain information related to an ongoing partition rebalance.


"the KafkaRebalance's existing ConfigMap" ... maybe we should mention this ConfigMap beforehand to explain what it contains currently and from where it is already referenced (see afterBeforeLoadConfigMap field).

ppatierno · 2025-01-07T09:29:36Z

088-rebalance-progress-status.md

+
+The estimated time it will take in minutes for a rebalance to complete based on the average rate of data transfer.
+
+The formulas used to calculate field value per `KafkaRebalance` state:


It's the CO calculating these.

ppatierno · 2025-01-07T09:37:05Z

088-rebalance-progress-status.md

+**Estimation for intra-broker rebalance:**
+
+It is challenging to provide an accurate estimate for intra-broker rebalances without an estimate for disk read/write throughput and getting disk throughput is non-trivial for Strimzi.
+However, by using the network bandwidth in place of the disk throughput, we can provide a rough estimate of how long the rebalance would take.


"by using the network bandwidth in place of the disk throughput" why do you think that we can do this "replacement" to get a rough estimation? I am not sure about that. I was wondering if we should just avoid this estimation if we don't have a good way for that.

Assuming the disk throughput is always greater than the network bandwidth, an estimate using the network bandwidth could serve as an upperbound (theoretical maximum) of how long an intra-broker rebalance would take. e.g the rebalance won't take longer than this. However, this contradicts the definition provided by the inter-broker balance and may cause confusion. We could simply set the value to N/A for now to avoid confusion/inaccuracy.

Would you mind if we left this estimate out for intra-broker balances @tomncooper?

I thought that is what we were going to do? Basically we don't include the intra estimate as we can't reliably calculate it. So the time to completion is always a theoretical minimum, it WILL take longer than this.

Yes I would leave this estimation out imho.

Updated proposal to explain this and suggest that we set it to "N/A"

ppatierno · 2025-01-07T09:40:17Z

088-rebalance-progress-status.md

+$$
+
+Notes
+- [1] The number of megabytes already moved by rebalance, provided by [/kafkacruisecontrol/state?substates=executor](#field-executorstate) REST API endpoint.


Can we put the specific field we are using from the returned JSON?

ppatierno · 2025-01-07T09:40:32Z

088-rebalance-progress-status.md

+Notes
+- [1] The number of megabytes already moved by rebalance, provided by [/kafkacruisecontrol/state?substates=executor](#field-executorstate) REST API endpoint.
+- [2] The time when the rebalance task was started, extracted from `triggeredTaskReason` field from the [/kafkacruisecontrol/state?substates=executor](#field-executorstate) for that task.
+- [3] The total number of megabytes planned to be moved for rebalance, provided from json payload of the [/kafkacruisecontrol/state?substates=executor](#field-executorstate) REST API endpoint.


Can we put the specific field we are using from the returned JSON?

ppatierno · 2025-01-07T09:40:55Z

088-rebalance-progress-status.md

+
+Notes
+ - [1] The number of megabytes already moved by rebalance, provided by [/kafkacruisecontrol/state?substates=executor](#field-executorstate) REST API endpoint.
+ - [2] The total number of megabytes planned to be moved for rebalance, provided by [/kafkacruisecontrol/state?substates=executor](#field-executorstate) REST API endpoint.


Can we put the specific fields we are using from the returned JSON for both the above?

ppatierno · 2025-01-07T09:43:58Z

088-rebalance-progress-status.md

+
+For ease of implementation and minimizing the load on the CruiseControl REST API server, the operator will only query the `/kafkacruisecontrol/state?substates=executor` endpoint and update the `ConfigMap` upon `KafkaRebalance` resource reconciliation.
+
+In the event that Cruise Control runs into an error when rebalancing, the operator will transition the `KafkaRebalance` resource to the `NotReady` state, remove the `progress` section, and delete the progress `ConfigMap`.


"delete the progress ConfigMap" or "delete the progress section of the ConfigMap"?
Remember we are using the same ConfigMap for two purposes (also storing the before/after load data).

Is this consistent with what currently happens if we fail to get a response from CC in a single reconciliation? Does the CC client retry? Not sure setting the KR CR to NotReady for what could be a simple network blip is good UX.

delete the progress ConfigMap" or "delete the progress section of the ConfigMap"?
Remember we are using the same ConfigMap for two purposes (also storing the before/after load data).

I am going to update the behavior described here to retain the progress information and ConfigMap when the KafkaRebalance resource moves to the NotReady state. As discussed in a previous thread with Paul, the progress information in the NotReady state may just be as useful for debugging as it is in the Stopped.

Is this consistent with what currently happens if we fail to get a response from CC in a single reconciliation? Does the CC client retry? Not sure setting the KR CR to NotReady for what could be a simple network blip is good UX.

This line was intended to describe how the progress information will be updated when CC server returns "CompletedWithError" status for a task. From what I understood when writing this, this was the only situation where the KR resource was moved to the NotReady state. But looking closer at the code, it appears I am wrong.

It appears the CO will move the KR resource to the NotReady state when it fails to get a response from the CC server, it also looks like the CO CC client code does not retry when failing to get a response (unless I am missing some retry logic in the code). This means that if the CO fails to get a response from CC server it will set the KR CR is set to Not Ready. I thought this only happened when there was an "CompleteWithError" response returned by the CC server, not when there was a failed HTTP request.

The proposal suggests attempting to retrieve the executor status but whether the retrieval succeeds or fails has no affect on the state of the KafkaRebalance resource.

Of course, the CC client, wherever it is used, should and will be implemented to retry

tomncooper

As discussed I think you should not estimate time to complete intra-broker movements in the proposal and just exclude them. The estimate will always be a theoretical minimum but it is useful to have a ball park.

tomncooper · 2025-01-08T11:29:53Z

088-rebalance-progress-status.md

@@ -0,0 +1,374 @@
+# Partition Rebalance Progress Status


Suggested change

# Partition Rebalance Progress Status

# Adding progress updates for Cruise Control rebalances

tomncooper · 2025-01-08T11:31:02Z

088-rebalance-progress-status.md

+
+At this time, Strimzi users are able to execute partition rebalances via `KafkaRebalance` custom resources but can only monitor the progression of those partition rebalances in two ways:
+
+- Manually querying the Cruise Control REST API endpoint directly. 


Don't we lock these down with a Network Policy? Would the user have to alter the default set up to get access?

Yes there are a couple of things that would need special configuration to enable a user to access the CC REST API directly

tomncooper · 2025-01-08T11:33:25Z

088-rebalance-progress-status.md

+In the `ConfigMap`, we will add the following fields:
+- **estimatedTimeToCompletionInMinutes**: The estimated amount time it will take in minutes until partition rebalance is complete.
+- **completedDataMovementPercentage**: The percentage of the data movement of the partition rebalance that is completed e.g. values in the range [0-100]%
+- **executorState**: The “non-verbose” JSON payload from the `/kafkacruisecontrol/state?substates=executor` endpoint.


Yeah, I agree you should give a brief summary of what this is and link to the OpenAPI def upstream.

tomncooper · 2025-01-08T11:38:21Z

088-rebalance-progress-status.md

+
+For ease of implementation and minimizing the load on the CruiseControl REST API server, the operator will only query the `/kafkacruisecontrol/state?substates=executor` endpoint and update the `ConfigMap` upon `KafkaRebalance` resource reconciliation.
+
+In the event that Cruise Control runs into an error when rebalancing, the operator will transition the `KafkaRebalance` resource to the `NotReady` state, remove the `progress` section, and delete the progress `ConfigMap`.


Is this consistent with what currently happens if we fail to get a response from CC in a single reconciliation? Does the CC client retry? Not sure setting the KR CR to NotReady for what could be a simple network blip is good UX.

fvaleri · 2025-01-08T16:25:18Z

088-rebalance-progress-status.md

+  progress: [1]
+    rebalanceProgressConfigMap: my-rebalance [2]
+```
+[1] The `progress` section will be visible during the `Ready`, `Rebalancing`, `Stopped` and `Ready` states.


Suggested change

[1] The `progress` section will be visible during the `Ready`, `Rebalancing`, `Stopped` and `Ready` states.

[1] The `progress` section will be visible during the `ProposalReady`, `Rebalancing`, `Stopped` and `Ready` states.

Signed-off-by: Kyle Liberti <[email protected]>

katheris · 2025-01-09T14:46:50Z

088-rebalance-progress-status.md

+This estimate will be a theoretical minimum derived from Cruise Control capacity and throttle configurations.
+This means that the cluster rebalance would take at least the estimated amount of time to complete.
+
+$$\text{maxPartitionMovements}_{[1]} = \min(\text{numberOfBrokers} \times \text{num.concurrent.partition.movements.per.broker}),\text{max.num.cluster.partition.movements})$$


I can't follow this calculation, is it calculating the maximum number of non-concurrent partition movements? I can't work out why the number of brokers is needed. Isn't the worst case scenario that all movements have to happen from a single broker, so should it be max.num.cluster.partition.movements/num.concurrent.partition.movements.per.broker? Also it looks like you are missing a bracket somewhere, possible at the beginning of numberOfBrokers x num.concurrent.partition.movements.per.broker?

is it calculating the maximum number of non-concurrent partition movements?

The maximum number of concurrent partition movements. Just updated the variable name to make this more clear.

I can't work out why the number of brokers is needed. Isn't the worst case scenario that all movements have to happen from a single broker, so should it be

This calculation is meant to be a theoretical minimum, the best case scenario, the least amount of time a rebalance would take to complete given ideal conditions (if the maximum allowed number of concurrent partitions movements per broker were moved concurrently and the available bandwidth was perfectly utilized). In reality, the rebalance will take longer than the theoretical minimum but it is still useful to know that the rebalance will take at least this estimated amount of time.

In the best case scenario, we are moving as many partitions concurrently as the brokers will allow. To calculate how many partitions can be move concurrently cluster-wide, we need the number of brokers.

Does that make sense? Would it help if I added annotations/descriptions for the CC configurations that are used in the formulas>

Also it looks like you are missing a bracket somewhere, possible at the beginning of numberOfBrokers x num.concurrent.partition.movements.per.broker?

Yes, that is a typo! Thanks for spotting!

katheris · 2025-01-09T14:47:46Z

088-rebalance-progress-status.md

+It is challenging to provide an accurate estimate for intra-broker rebalances without an estimate for disk read/write throughput and getting disk throughput is non-trivial for Strimzi.
+Since we cannot accurately estimate `estimatedTimeToCompletionInMinutes` without knowing the disk throughput, we set `estimatedTimeToCompletionInMinutes` to `N/A`.
+
+$$\text{maxPartitionMovements}_{[1]} = \min\left(\text{numberOfBrokers} \times \text{num.concurrent.intra.broker.partition.movements.per.broker}),\text{max.num.cluster.movements}\right)$$


Similar here to previous comments, you're missing a bracket in the formula, but I'm also not clear what maxPartitionMovements is really representing

Signed-off-by: Kyle Liberti <[email protected]>

Add progress status for partition rebalances

169723b

Signed-off-by: Kyle Liberti <[email protected]>

kyguy requested review from tomncooper, scholzj, ppatierno and fvaleri November 26, 2024 07:10

scholzj reviewed Nov 26, 2024

View reviewed changes

tomncooper suggested changes Nov 26, 2024

View reviewed changes

fvaleri reviewed Dec 2, 2024

View reviewed changes

katheris reviewed Dec 2, 2024

View reviewed changes

ppatierno reviewed Dec 9, 2024

View reviewed changes

kyguy force-pushed the kr-exec-progress branch 7 times, most recently from d294906 to ff9df7e Compare December 11, 2024 00:07

Addressing feedback related to formatting/grammer

0f58cbb

Signed-off-by: Kyle Liberti <[email protected]>

kyguy force-pushed the kr-exec-progress branch from ff9df7e to 0f58cbb Compare December 11, 2024 00:58

Addressing feedback - js, ks, pp

d1433d8

Signed-off-by: Kyle Liberti <[email protected]>

kyguy force-pushed the kr-exec-progress branch 8 times, most recently from a310c4d to b55824e Compare December 18, 2024 02:10

Update wording and formatting

bc7f1ed

Signed-off-by: Kyle Liberti <[email protected]>

kyguy force-pushed the kr-exec-progress branch from b55824e to bc7f1ed Compare December 18, 2024 02:15

kyguy requested review from tinaselenge, fvaleri, katheris, ppatierno, scholzj and tomncooper December 18, 2024 14:59

scholzj reviewed Dec 18, 2024

View reviewed changes

ppatierno reviewed Dec 19, 2024

View reviewed changes

tomncooper reviewed Dec 19, 2024

View reviewed changes

fvaleri reviewed Dec 20, 2024

View reviewed changes

Addressing feedback - js, pp, fv, tc

2a6b7f6

Signed-off-by: Kyle Liberti <[email protected]>

scholzj reviewed Dec 23, 2024

View reviewed changes

scholzj requested review from tombentley, ppatierno, Frawless, sknot-rh, see-quick, samuel-hawker, PaulRMellor and im-konge December 23, 2024 20:27

PaulRMellor reviewed Jan 2, 2025

View reviewed changes

ppatierno reviewed Jan 7, 2025

View reviewed changes

tomncooper reviewed Jan 8, 2025

View reviewed changes

fvaleri reviewed Jan 8, 2025

View reviewed changes

Addressing feedback - js, pm, pp, tc, fv

083a960

Signed-off-by: Kyle Liberti <[email protected]>

katheris reviewed Jan 9, 2025

View reviewed changes

Addressing feedback - ks

1493f05

Signed-off-by: Kyle Liberti <[email protected]>


		In this “progress” section, we include the following fields:

		- estimatedTimeToCompletion: The minimum estimated amount time it will take in minutes until partition rebalance is complete.


		### Supported KafkaRebalance States

		For initial implementation we will focus on including the “progress” section only in the following KafkaRebalance states:

	For initial implementation we will focus on including the “progress” section only in the following KafkaRebalance states:
	For the initial implementation, we will focus on including the “progress” section only in the following KafkaRebalance states:


		helps users understand the cost of an ongoing partition rebalance, decide whether or not they should continue or cancel it, and know when future operations will be able to be safely executed.

		Further, having this information readily available and easily accessible via `KafkaRebalance` custom resources allows users and third-party tools like the Kubernetes CLI or Strimzi Console to easily track the progression of a partition rebalance.


		#### Adding “progress” section for other KafkaRebalance states

		In addition to the “progress” the “Rebalancing” and “Stopped” KafkaRebalance states, we could provide the “progress” section for other states as well such as the “ProposalReady” and “Ready” states.

	Leveraging the Cruise Control configurations and user-provided network capacity settings, we could provide a rough estimate for “estimatedTimeToCompletetion” field for inter-broker balances.
	Leveraging the Cruise Control configurations and user-provided network capacity settings, we could provide a rough estimate for “estimatedTimeToCompletetion” field for inter-broker movements.

	One concern is that some of the fields like the “pendingParitionMovements” field can cause the JSON output to grow quite large.
	One concern is that some of the fields like the “pendingPartitionMovements” field can cause the JSON output to grow quite large.

	- percentageComplete: The percentage of the partition rebalance that is completed e.g. values in the range [0-100]%
	- percentageDataMovementComplete: The percentage of the partition rebalance that is completed e.g. values in the range [0-100]%


		#### rebalanceProgressConfigMap

		Will only be present in “Rebalancing” and “Stopped” states.


		Given that its inclusion is not completely necessary and adds significant complexity to the proposal, it is out of scope for this proposal.

		#### Configurable verbosity for Executor State


		## Proposal

		This proposal extends the status section of the `KafkaRebalance` custom resource to include a `progress` section with a nested `rebalanceProgressConfigMap` field that references a `ConfigMap` that contains information related to an ongoing partition rebalance.


		it is best if we maintain the progress information somewhere else.

		#### Including “ExecutorState” in “afterBeforeLoadConfigmap”

		estimatedTimeToCompletionInMinutes: 5m [1]
		completedDataMovementPercentage: 80% [2]


		The estimated time it will take in minutes for a rebalance to complete based on the average rate of data transfer.

		The formulas used to calculate field value per `KafkaRebalance` state:

	[2] The `ConfigMap` containing information related to the ongoing partition rebalance
	[2] The `ConfigMap` containing information related to the ongoing partition rebalance.

	- executorState: The “non-verbose” JSON payload from the `/kafkacruisecontrol/state?substates=executor` endpoint.
	- executorState: The “non-verbose” JSON payload from the` /kafkacruisecontrol/state?substates=executor` endpoint, providing details about the executor's current status, including partition movement progress, concurrency limits, and total data to move.

	[2] The percentage complete of the ongoing rebalance in the range [0-100]%
	[2] The percentage complete of the ongoing rebalance in the range [0-100]%.


		The percentage of the data movement of the partition rebalance that is completed.

		The formulas used to calculate field value per `KafkaRebalance` state:

	The formulas used to calculate field value per `KafkaRebalance` state:
	The formulas used to calculate the field value differ for each applicable `KafkaRebalance` state:

	[1] Error message from failed Cruise Control REST API call
	[1] Error message from failed Cruise Control REST API call.

Add progress status for partition rebalances #140

Are you sure you want to change the base?

Add progress status for partition rebalances #140

Conversation

kyguy commented Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tinaselenge Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

kyguy Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kyguy Dec 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomncooper left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kyguy Dec 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kyguy Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

fvaleri left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

katheris left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomncooper Dec 19, 2024 • edited Loading

Choose a reason for hiding this comment

fvaleri Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fvaleri Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

kyguy commented Nov 26, 2024 •

edited

Loading

tinaselenge Dec 3, 2024 •

edited

Loading

kyguy Dec 3, 2024 •

edited

Loading

kyguy Dec 12, 2024 •

edited

Loading

kyguy Dec 11, 2024 •

edited

Loading

kyguy Dec 18, 2024 •

edited

Loading

tomncooper Dec 19, 2024 •

edited

Loading

fvaleri Dec 20, 2024 •

edited

Loading

fvaleri Dec 20, 2024 •

edited

Loading

ppatierno Jan 7, 2025 •

edited

Loading

	Using the name of the ConfigMap we can view its data from the command line using the Kubernetes CLI.
	Using the name of the `ConfigMap`, we can view its data from the command line using the Kubernetes CLI.


		### Rejected Alternatives

		#### Maintaining progress fields in KafkaRebalance resource status

	#### Maintaining progress fields in KafkaRebalance resource status
	#### Maintaining progress fields in `KafkaRebalance` resource status


		For ease of implementation and minimizing the load on the CruiseControl REST API server, the operator will only query the `/kafkacruisecontrol/state?substates=executor` endpoint and update the `ConfigMap` upon `KafkaRebalance` resource reconciliation.

		In the event that Cruise Control runs into an error when rebalancing, the operator will transition the `KafkaRebalance` resource to the `NotReady` state, remove the `progress` section, and delete the progress `ConfigMap`.

	# Partition Rebalance Progress Status
	# Adding progress updates for Cruise Control rebalances


		At this time, Strimzi users are able to execute partition rebalances via `KafkaRebalance` custom resources but can only monitor the progression of those partition rebalances in two ways:

		- Manually querying the Cruise Control REST API endpoint directly.

	[1] The `progress` section will be visible during the `Ready`, `Rebalancing`, `Stopped` and `Ready` states.
	[1] The `progress` section will be visible during the `ProposalReady`, `Rebalancing`, `Stopped` and `Ready` states.