Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running antsMultivariateTemplateConstruction2 using SLURM/sbatch #1815

Open
ncbss opened this issue Dec 4, 2024 · 2 comments
Open

Running antsMultivariateTemplateConstruction2 using SLURM/sbatch #1815

ncbss opened this issue Dec 4, 2024 · 2 comments

Comments

@ncbss
Copy link

ncbss commented Dec 4, 2024

Operating system and version

Rocky Linux 9.4 (Blue Onyx) & Apptainer Container (Ubuntu 18.04.6 LTS)

CPU architecture

x86_64 (PC, Intel Mac, other Intel/AMD)

ANTs code version

v2.4.2.dev1-g0e2ea40

ANTs installation type

Other docker image (please provide URL below)

Container information:

org.label-schema.build-arch: amd64
org.label-schema.build-date: Sunday_20_November_2022_21:30:16_PST
org.label-schema.schema-version: 1.0
org.label-schema.usage.singularity.deffile.bootstrap: docker
org.label-schema.usage.singularity.deffile.from: antsx/ants:master
org.label-schema.usage.singularity.version: 3.8.5

Summary of the problem

I am trying to run antsMultivariateTemplateConstruction2.sh using option -c 5 in our cluster at UBC. I am using a containerized version of ANTs. It looks like the script cannot find sbatch inside the container. So, I get the error do you have sbatch? if not, then choose another c option ... if so, then check where the sbatch alias points .... Apologies if this a naive question on my end, but I am bit confused as to how I can use sbatch in this context. I have tried running the script locally in the cluster environment, or submitting I to the scheduler via sbatch. Can you guys help?

Commands to reproduce the problem

#!/bin/sh
#SBATCH --time=05:00:00
#SBATCH --account=<account>
#SBATCH --job-name=study_t1w_template
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=20G
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<email>
#SBATCH --output=<output>
#SBATCH --error=<error>

# #########################################################################################

# Load modules
module load apptainer/1.3.1

  apptainer exec \
    -B /path-to-template-dir/t1w_study_template \
    --cleanenv /path-to-container/ants-v2.4.2.sif \
  antsMultivariateTemplateConstruction2.sh \
    -d 3 \
    -c 5 \
    -u 03:00:00 \
    -v 10G \
    -i 4 \
    -r 1 \
    -o /path-to-template-dir/t1w_study_template/template/t1w_study_template_ \
      /path-to-template-dir/t1w_study_template/input/*T1w*.nii.gz >  /path-to-template-dir/t1w_study_template/logs/t1w_study_template.log

Output of the command with verbose output.

do you have sbatch?  if not, then choose another c option ... if so, then check where the sbatch alias points ...

Data to reproduce the problem

Not necessary.

Thank you!

@cookpa
Copy link
Member

cookpa commented Dec 5, 2024

I think the only way you can run parallel processing within the container is to use pexec. For example, on my LSF cluster, I can request 32 CPUS for a job. I can then run antsMultivariateTemplateConstruction2.sh inside the container with -c 2 -j 8, and ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS=4.

The pexec script then runs 8 parallel processes, and each uses up to 4 threads. To conserve memory, I do 8 processes * 4 threads rather than 32 processes * 1 thread.

This reminds me I made some edits to the pexec script that I should commit (like polling less frequently).

I don't think this will help with the containerized version, but if you can run a locally installed ANTs, there is an alternative implementation from @gdevenyi:

https://github.com/CoBrALab/optimized_antsMultivariateTemplateConstruction

@gdevenyi
Copy link
Contributor

gdevenyi commented Dec 6, 2024

The main difference in my implementation is that my pipeline handles staging the work graph using the cluster's dependency management system rather than polling. That in-itself won't solve the containerization problem. However.. if you want containerized ANTs, you may be able to use this: https://github.com/gdevenyi/singularity-exec-wrapper a wrapper I wrote to expose binaries inside containers to "act" as regular commands in PATH. This will not fix the problem that the pipeline can't see sbatch however. You would need to have the ants .sh script as a non-containerized file, and the rest of the binaries could be the container ones.

For example, on my LSF cluster,
@cookpa
Mostly unrelated... I have unmerged/untetsted LSF support in my qbatch cluster abstraction tool: CoBrALab/qbatch#51. Would you be willing to test it out if I fix it up? That would mean a bunch of my software pipelines would gain LSF support...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants