Preemption does not meet expectations #3909

dafu-wu · 2024-12-25T02:42:07Z

What happened:
There is a pytorchjob that occupies three nodes (master 1, worker 2) and 24 GPUs. The workload priority is 1000 and runs normally. Now, a pytorchjob with the same configuration is submitted to the same clusterqueue and the priority is 1000. High-priority tasks cannot Preempt low-priority tasks and report the following error:

  conditions:
  - lastTransitionTime: "2024-12-25T02:15:59Z"
    message: 'couldn''t assign flavors to pod set master: insufficient unused quota
      for nvidia.com/gpu in flavor multi-node-h100, 8 more needed, insufficient unused
      quota for nvidia.com/gpu in flavor single-node-h100, 8 more needed; couldn''t
      assign flavors to pod set worker: insufficient quota for nvidia.com/gpu in flavor
      multi-node-h100, request > maximum capacity (16 > 8), insufficient quota for
      nvidia.com/gpu in flavor single-node-h100, request > maximum capacity (24 >
      16)'
    observedGeneration: 1
    reason: Pending
    status: "False"
    type: QuotaReserved

But, a pytorchjob（master 1, worker 2， 16GPUs） is submitted to the same clusterqueue and the priority is 1000，it‘s works

What you expected to happen:
It is hoped that the high-priority pytorchjob configured the same as the low-priority pytorchjob can normally preempt the low-priority pytorchjob.

How to reproduce it (as minimally and precisely as possible):

clusterqueue

apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
  name: cluster-queue-asr
spec:
  namespaceSelector: {} 
  cohort: h100-dev
  flavorFungibility:
    whenCanBorrow: TryNextFlavor
    whenCanPreempt: Preempt
  preemption:
    borrowWithinCohort:
      maxPriorityThreshold: 100
      policy: LowerPriority
    reclaimWithinCohort: LowerPriority
    withinClusterQueue: LowerPriority
  queueingStrategy: BestEffortFIFO
  resourceGroups:
  - coveredResources:
    - nvidia.com/gpu
    flavors:
    - name: single-node-h100
      resources:
      - name: nvidia.com/gpu
        nominalQuota: "16"
        borrowingLimit: "8"
    - name: multi-node-h100
      resources:
      - name: nvidia.com/gpu
        nominalQuota: "8"
        borrowingLimit: "8"
  stopPolicy: None
---

apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
  name: multi-node-h100
spec:
  nodeLabels:
    mlp: multi-node

---

apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
  name: single-node-h100
spec:
  nodeLabels:
    mlp: single-node
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
  name: asr
spec:
  clusterQueue: cluster-queue-asr```
- pytorch-simple-low-priority
```apiVersion: kubeflow.org/v1
kind: PyTorchJob
metadata:
  name: pytorch-simple-low-priority
  namespace: default
  labels:
    kueue.x-k8s.io/queue-name: asr
    kueue.x-k8s.io/priority-class: low-priority
spec:
  pytorchReplicaSpecs:
    Master:
      replicas: 1
      restartPolicy: OnFailure
      template:
        spec:
          containers:
            - name: pytorch
              image: docker.io/kubeflowkatib/pytorch-mnist-cpu:v1beta1-21320b6
#              If you have gpu, pytorch-mnist-gpu would be helpful. pytorch-mnist-gpu is approximately 22GB
#              image: docker.io/kubeflowkatib/pytorch-mnist-cpu:latest
              imagePullPolicy: Always
              command:
                - "python3"
                - "/opt/pytorch-mnist/mnist.py"
                - "--epochs=100000"
              resources:
                limits:
                  nvidia.com/gpu: 8

    Worker:
      replicas: 2
      restartPolicy: OnFailure
      template:
        spec:
          containers:
            - name: pytorch
              image: docker.io/kubeflowkatib/pytorch-mnist-cpu:v1beta1-21320b6
#              If you have gpu, pytorch-mnist-gpu would be helpful. pytorch-mnist-gpu is approximately 22GB
#              image: docker.io/kubeflowkatib/pytorch-mnist-cpu:latest
              imagePullPolicy: Always
              command:
                - "python3"
                - "/opt/pytorch-mnist/mnist.py"
                - "--epochs=100000"
              resources:
                limits:
                  nvidia.com/gpu: 8

pytorch-simple-high-priority

kind: PyTorchJob
metadata:
  name: pytorch-simple-high-priority
  namespace: default
  labels:
    kueue.x-k8s.io/queue-name: asr
    kueue.x-k8s.io/priority-class: high-priority
spec:
  pytorchReplicaSpecs:
    Master:
      replicas: 1
      restartPolicy: OnFailure
      template:
        spec:
          containers:
            - name: pytorch
              image: docker.io/kubeflowkatib/pytorch-mnist-cpu:v1beta1-21320b6
#              If you have gpu, pytorch-mnist-gpu would be helpful. pytorch-mnist-gpu is approximately 22GB
#              image: docker.io/kubeflowkatib/pytorch-mnist-cpu:latest
              imagePullPolicy: Always
              command:
                - "python3"
                - "/opt/pytorch-mnist/mnist.py"
                - "--epochs=1"
              resources:
                limits:
                  nvidia.com/gpu: 8

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version): 1.29.5
Kueue version (use git describe --tags --dirty --always):0.10.0
OS (e.g: cat /etc/os-release): Ubuntu 22.04.5 LTS
Kernel (e.g. uname -a): Linux scl-c26-r3-svr05 5.15.0-117-generic

The text was updated successfully, but these errors were encountered:

dafu-wu · 2024-12-25T08:36:38Z

Whether low priority workloads should be considered in maxCapacity？

maxCapacity := a.cq.PotentialAvailable(fr)

https://github.com/kubernetes-sigs/kueue/blame/95bc3b28da49862186f5e38924e1507ce4b7c703/pkg/scheduler/flavorassigner/flavorassigner.go#L608C22-L608C40

dafu-wu added the kind/bug Categorizes issue or PR as related to a bug. label Dec 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preemption does not meet expectations #3909

Preemption does not meet expectations #3909

dafu-wu commented Dec 25, 2024

dafu-wu commented Dec 25, 2024 •

edited

Loading

Preemption does not meet expectations #3909

Preemption does not meet expectations #3909

Comments

dafu-wu commented Dec 25, 2024

dafu-wu commented Dec 25, 2024 • edited Loading

dafu-wu commented Dec 25, 2024 •

edited

Loading