-
Notifications
You must be signed in to change notification settings - Fork 614
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
LFX: Add Volcano project for March - May term 2024
Signed-off-by: Xuzheng Chang <[email protected]>
- Loading branch information
Showing
1 changed file
with
27 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -133,3 +133,30 @@ | |
- [Manan Gupta](https://github.com/GuptaManan100) ([email protected]) | ||
- [Harshit Gangal](https://github.com/harshit-gangal) ([email protected]) | ||
- Issue: <https://github.com/vitessio/vitess/issues/14931> | ||
|
||
### Volcano | ||
|
||
#### Volcano supports multi-cluster AI workloads scheduling | ||
|
||
- Description: Volcano provides rich scheduling capabilities for AI workloads in the field of single cluster. In large model training scenarios, a single cluster cannot meet the computing power requirements of jobs, more and more users hope to submit jobs uniformly on multiple clusters for large model training, volcano needs to provide various scheduling capabilities, such as job management, gang scheduling, queue management, etc., and select the appropriate cluster for jobs to cope with the requirements of large model training. | ||
- Expected Outcome: | ||
- Implement a basic multi-clusters scheduling framework integrated with multi-clusters scheduler like [Karmada](https://github.com/karmada-io/karmada) or other multi-cluster orchestration. | ||
- Implement gang scheduling, fair scheduling in multi-cluster. | ||
- Implement queue management in multi-cluster. | ||
- Recommended Skills: Go, Kubernetes, Volcano | ||
- Mentor(s): | ||
- william wang(@william-wang, [email protected]) | ||
- Xuzheng Chang(@Monokaix, [email protected]) | ||
- Upstream Issue: https://github.com/volcano-sh/volcano/issues/3310 | ||
|
||
#### Volcano supports DRA integration | ||
|
||
- Description: [DRA](https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/) is a new generation device management mechanism for kubernetes. It introduces a new resource request API `ResourceClaim`, which requires kubelet, kube-controller-manager, scheduler, and third-party device management controllers to cooperate with each other to work. The kube-scheduler has implemented corresponding scheduling capabilities, Volcano also needs to implement the DRA scheduling plug-in to integrate the DRA function. | ||
- Expected Outcome: | ||
- A design document describing how to integrate DRA into volcano. | ||
- Implement DRA plugin in volcano. | ||
- Recommended Skills: Go, Kubernetes, Volcano | ||
- Mentor(s): | ||
- william wang(@william-wang, [email protected]) | ||
- Xuzheng Chang(@Monokaix, [email protected]) | ||
- Upstream Issue: https://github.com/volcano-sh/volcano/issues/3143 |