Skip to content
This repository has been archived by the owner on Sep 12, 2023. It is now read-only.

feat: Add successpolicy #181

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

feat: Add successpolicy #181

wants to merge 1 commit into from

Conversation

gaocegege
Copy link
Member

SuccessPolicy is used in both PyTorchJob and TFJob. Thus I propose to add it in common.

Signed-off-by: cegao [email protected]

@gaocegege
Copy link
Member Author

/assign @terrytangyuan @Jeffwan @zw0610

Copy link
Member

@terrytangyuan terrytangyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. Although I wonder if we can use this opportunity to re-think the approach. Instead of enums, would a label selector similar to what Argo Workflows is doing here be more flexible and allows more fine-grained control over the success conditions/policies? https://github.com/argoproj/argo-workflows/blob/master/examples/k8s-kubeflow-jobs.yaml#L24-L25

@gaocegege
Copy link
Member Author

It is used in Katib. I think it works, but I think we should support successPolicy to keep API consistency.

cc @andreyvelich

@andreyvelich
Copy link
Member

Yes, we are using successCondition with GSON format in our APIs to define condition for Katib Trial's Workers, similar to Argo Workflows.
Probably, we can think how to use it in Training Operators.

@terrytangyuan
Copy link
Member

Let's use kubeflow/training-operator#1507 to track and discuss separately.

Copy link
Member

@terrytangyuan terrytangyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is LGTM for consistency. Others PTAL.

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: terrytangyuan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gaocegege
Copy link
Member Author

/assign @zw0610 @Jeffwan

@zw0610
Copy link
Member

zw0610 commented Dec 14, 2021

LGTM. Meanwhile, could you add descriptions for SchedulingPolicy and SuccessPolicyAllWorkers to explain the expected behavior?

@gaocegege
Copy link
Member Author

SGTM

@gaocegege
Copy link
Member Author

/hold

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants