Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to bump secondary ASG if primary is full #152

Open
zaafar opened this issue Aug 29, 2023 · 0 comments
Open

Ability to bump secondary ASG if primary is full #152

zaafar opened this issue Aug 29, 2023 · 0 comments

Comments

@zaafar
Copy link

zaafar commented Aug 29, 2023

Describe the feature request

[cluster-a-us-east-1-large-nodes-az1] outdated=10; updated=0; updatedAndReady=0; asgCurrent=10; asgDesired=10; asgMax=10
[cluster-a-us-east-1-large-nodes-az1][i-0xyz] Node already started rollout process
[cluster-a-us-east-1-large-nodes-az1][i-0xyz] Updated nodes do not have enough resources available, increasing desired count by 1
[cluster-a-us-east-1-large-nodes-az1][i-0xyz] Unable to increase ASG desired size: cannot increase ASG desired size above max ASG size
[cluster-a-us-east-1-large-nodes-az1][i-0xyz] Skipping

Imaging a world where there is primary, secondary, tertiary ASGs available to use. In that scenario, if primary ASG is maxed bumped, secondary or tertiary can be used without any issues.

Would it be possible to implement a feature where aws-eks-asg-rolling-update-handler bumps a secondary or tertiary ASG if primary is full. It would determine if 2 or more ASGs are grouped together with help of ASG tag provided by the user (i.e. if a TAG X has value Y then it belongs to group Y. If a TAG X has value Z then it belongs to group Z).

Why do you personally want this feature to be implemented?

So I don't have to manually bump the MAX ASG.

How long have you been using this project?

8 months

Additional information

An easy win would be to expose this error as a separate Prometheus metric. While rolling_update_handler_errors is good, it doesn't differentiate between different types of errors (or maybe add error type as a cardinality in rolling_update_handler_errors metric). This way I can create an alert when this happens rather than constantly monitoring the logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant