Train model in slurm cluster. #1861
Unanswered
mrbeann
asked this question in
Community | Q&A
Replies: 1 comment
-
Hi @mrbeann , I don't think torchrun can work well with slurm. To deal with this kind of issues, we provide our own launcher for the slurm platform. Please refer to https://colossalai.org/docs/basics/launch_colossalai#launch-with-slurm and have a try. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I try to use colossalAI in a cluster managed by slurm. I first open a shell through a command like
srun --pty /bin/bash
. Then I try the starter example. However it raises the following error,Is there any idea about this?
Beta Was this translation helpful? Give feedback.
All reactions