Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial setup of Payu environment #1

Merged
merged 23 commits into from
Dec 19, 2024
Merged

Initial setup of Payu environment #1

merged 23 commits into from
Dec 19, 2024

Conversation

jo-basevi
Copy link
Collaborator

@jo-basevi jo-basevi commented Sep 26, 2024

This PR has some initial work setting up containerized squashfs conda environments for payu using the work done by Dale for the hh5's analysis conda containerised environments (https://github.com/coecms/cms-conda-singularity). As a quick overview this PR:

  • Adds payu and payu-dev environment configuration (payu-dev is similar except payu is installed using pip using the main payu branch on Github).
  • Replace existing ssh related third-party actions in workflows with ACCESS-NRI/actions
  • Move some project-specific configurations out of the scripts into GitHub variables.

What's cool with the cms-conda-singularity scripts is that it already works out of the box with building Python virtual environments on top of the squashfs conda environments. This is useful for the Repro CI tests run using payu in model-config-tests(https://github.com/ACCESS-NRI/model-config-tests/). So I was able to use virtual environments to run the reproducibility tests for an ACCESS-OM2 configuration (tag: release-1deg_jra55_ryf-2.0) and an ACCESS-ESM1.5 configuration (tag: release-historical+concentrations-1.1) using payu and payu-dev as a base conda environments and everything passed.

I've manually run the scripts in the workflows for building, testing and deploying environments (latest installs are in /g/data/tm70/jb4202/tmp-conda/). I am holding off running any CI deployment to Gadi workflows until installation paths and variables are finalised.

Notes:

  • Base installation paths need to be in /g/data/:
    Initially, I was running into errors when manually running the build scripts with directories not existing and squashfs image not correctly being set up. The reason was that I was using /scratch directories as base directories (e.g. CONDA_BASE), rather than /g/data/ - The build scripts assume the base directories where the environments will eventually be deployed to start with /g.

  • Pip installed packages:
    Existing payu development environments install payu from the main branch. Pip-installed packages had incorrect shebang headers pointing to a directory on /jobfs/ where the environment was initially built. There is already an issue for this: Issue with deployment of pip installed python packages with command line tools ACCESS-Analysis-Conda#78. I used Romain's solution here: https://github.com/ACCESS-NRI/MED-condaenv/blob/2c0f730b54cfa6a19b6df4300f8dd27cf3b877d0/environments/esmvaltool/build_inner.sh#L9

  • Payu PBS qsub calls:
    Payu submits jobs similar to qsub -- path/to/env/python path/to/env/payu-run (when running the command payu run). This path/to/env/python would point to a Python executable only accessible inside the container. Each of the environment commands in the container has a corresponding script outside the container (symlink to launcher.sh), that would launch the container and then run the command inside the container. I noticed when testing the conda_concept/analysis modules in /g/data/hh5/public/modules/, running the launcher python script with a payu command would have a sys.executable that points back to launcher python script. So running/g/data/hh5/public/apps/cms_conda_scripts/analysis3-24.04.d/bin/python /g/data/hh5/public/apps/cms_conda/envs/analysis3-24.04/bin/payu run, would pass the launcher python script along to subsequent payu qsub submits. So for a somewhat hacky fix, I modified the Python shebang for the payu command to use the outside Python launcher script. (Why does the sys.executable point to the Python launcher script? I think because launcher.sh preserves the original argv[0] by using exec -a, e.g. exec -a /path/to/outside/python /path/to/inner-env/python /path/to/inner-env/payu-run)

    An alternative solution to the above would be to modify the payu source code to add the launcher script to the qsub commands. E.g.

    Checks if inside a container (if a SINGULARITY environment variable is set - e.g. `SINGULARITY_ENVIRONMENT` `SINGULARITY_NAME`)
    Check if there is a `LAUNCHER_SCRIPT` environment variable (this might be specific only to these environments)
    `qsub -- $LAUNCHER_SCRIPT path/to/env/bin/python path/to/env/bin/payu-run`
    

    This approach is hard-coding a custom environment variable into payu - though it might make it easier for others to run payu inside a container as they will only need the LAUNCHER_SCRIPT environment variable to be defined. However, I am not sure how to guarantee this variable points to the correct script that launches the container that contains the payu environment.

    After chatting with Aidan, another solution would be if (when) Payu ends up using HPCPY (https://github.com/ACCESS-NRI/hpcpy) and payu had a templated script that runs qsub calls. The build scripts in this repository could modify that template, to add in the launcher script. There are also existing override command scripts in this repository so there probably is another solution to this problem.. In the meantime, while I am testing, I'm using the modified shebang header for payu commands as it doesn't require changes to payu.

  • Github Environment Variables:
    @aidanheerdegen suggested moving the project-specific installation paths to Github where they can be set via Github Environment Variables. This is so paths can be changed without modifying the source code. Initially, I moved just the ADMIN_DIR (base directory for logs and staging environments tar files), and CONDA_BASE (base directory which will contain the apps/ and modules/ subdirectories). As the paths may also impact other configuration settings, e.g. project and storage flags passed to build qsub calls, and the groups used for configuring file permissions of admin and deployed directories (APPS_USERS_GROUP and APPS_OWNERS_GROUP). So I moved those also to Github Variables.

    Proposed Github Variable settings for Gadi environment:

    • CONDA_BASE: /g/data/vk83/prerelease (the directory that contains apps/ and modules/ subdirectories)
    • ADMIN_DIR: /g/data/vk83/admin/conda_containers/prerelease (directory to store staging and log files, tar files of conda environments, and backups of old environment squashfs files)
    • APPS_USERS_GROUP: vk83 (Permissions of read and execute for files installed to apps and modules)
    • APPS_OWNERS_GROUP: vk83_w ? (Read/write/executable permissions for installed files)
    • PROJECT: tm70 (Build and test PBS jobs project)
    • STORAGE: gdata/vk83 (Build and test PBS jobs storage directives)
    • secrets.REPO_PATH: ? (This is the path where all this repository is rsynced to and all the scripts are run from)

    The above settings, install_config.sh settings, and the current conda environments would add the following to /g/data/vk83/prerelease/:

    ├── apps
    │   ├── base_conda
    │   │   ├── bin
    │   │   │   └── micromamba
    │   │   ├── envs
    │   │   │   ├── payu -> payu-1.1.5
    │   │   │   ├── payu-1.1.5 -> /opt/conda/payu-1.1.5
    │   │   │   ├── payu-1.1.5.sqsh
    │   │   │   ├── payu-dev -> /opt/conda/payu-dev
    │   │   │   ├── payu-dev.sqsh
    │   │   │   └── payu-unstable -> payu-1.1.5
    │   │   └── etc
    │   │       └── base.sif
    │   └── conda_scripts
    │       ├── launcher_conf.sh
    │       ├── launcher.sh
    │       ├── overrides
    │       │   ├── functions.sh
    │       │   ├── jupyter.config.sh
    │       │   ├── mpicc.config.sh
    │       │   ├── pbs_tmrsh.sh
    │       │   └── ssh.sh
    │       ├── payu-1.1.5.d
    │       │   ├── bin
                    # Launch script symlinks (I've left them out here), e.g payu -> launcher.sh, python3 -> launcher.sh
    │       │   │   ├── launcher_conf.sh
    │       │   │   ├── launcher.sh
    │       │   └── overrides
    │       │       ├── functions.sh -> ../../overrides/functions.sh
    │       │       ├── jupyter.config.sh -> ../../overrides/jupyter.config.sh
    │       │       ├── mpicc.config.sh -> ../../overrides/mpicc.config.sh
    │       │       ├── pbs_tmrsh.sh -> ../../overrides/pbs_tmrsh.sh
    │       │       └── ssh.sh -> ../../overrides/ssh.sh
    │       ├── payu.d -> payu-1.1.5.d
    │       ├── payu-dev.d
    │       │   ├── bin
                    # Launch script symlinks (I've left them out here), e.g payu -> launcher.sh
    │       │   │   ├── launcher_conf.sh
    │       │   │   ├── launcher.sh
    │       │   └── overrides
    │       │       ├── functions.sh -> ../../overrides/functions.sh
    │       │       ├── jupyter.config.sh -> ../../overrides/jupyter.config.sh
    │       │       ├── mpicc.config.sh -> ../../overrides/mpicc.config.sh
    │       │       ├── pbs_tmrsh.sh -> ../../overrides/pbs_tmrsh.sh
    │       │       └── ssh.sh -> ../../overrides/ssh.sh
    │       └── payu-unstable.d -> payu-1.1.5.d
    └── modules
        └── conda_container
        ├── payu-1.1.5 -> .common_v3
        └── payu-dev -> .common_v3
    

    So loading the modules would be

    module load /g/data/vk83/prerelease/modules
    module load conda_container/payu # or conda_container/payu-1.1.5 or conda_container/payu-dev
    

    I've named the micromamba install directory base_conda and module name container_container so it does not clash with existing conda/ directories in vk83.

Issues: (TODO: split off into separate Github Issues)

  • Different locations for release and pre-release environments and automatic payu-dev updates (see Automatic updates to payu-dev environment #2)
  • Investigate using conda-pack environments similarly to workflows in https://github.com/ACCESS-NRI/payu-condaenv. Would using conda-pack environments simplify things or not?
  • Process for deprecating and deleting old payu environments?
  • Switched to installing an official micromamba when a pre-existing environment does not exist (see Using official Micromamba install #3)
  • To get git signing working I removed the settings in environment/config.sh that removed "openssh-clients", "openssh-server" and "openssh" from the environment, and include an outside "ssh" command. In the cms documentation for the conda environments (https://climate-cms.org/cms-wiki/resources/resources-conda-setup.html#technical-details), has "As a part of the installation process, the openssh packages are removed from the conda installation, which forces use of the system ssh and, more importantly, its configuration." So I am wondering if I will accidentally break something by removing those.
  • Workflows: Github deployment to Gadi environment is triggered at Setup, Build and Test jobs. As the settings for Gadi environment requires reviewers, this will require many signoffs in a Pull Request. This is fine for testing stage as can run through the logs, and manually check things between each step but might be unnecessary later on. Could move jobs into one job so it only requires one sign off to deploy to Gadi?

@jo-basevi jo-basevi force-pushed the setup-payu-environment branch from 06b8986 to 58da917 Compare September 27, 2024 02:42
@jo-basevi jo-basevi force-pushed the setup-payu-environment branch from 58da917 to 74a8e02 Compare September 27, 2024 03:12
@jo-basevi
Copy link
Collaborator Author

Just some more quick details on how it's been tested. I manually ran all the build/test/deploy commands in the workflows. Added the commands here for reference. The pbs logs for the build/test scripts are in $JOB_LOG_DIR (/g/data/tm70/jb4202/tmp-conda/admin/conda_containers/logs for the manual tests). In the tests I used a REPO_PATH in my home directory which contains a built base container file (container/base.sif). I've just rsynced it to here: /g/data/tm70/jb4202/tmp-conda/model-release-condaenv for reference. To run commands for payu-dev, use CONDA_ENVIRONMENT="payu-dev"

Setup command
bash << 'EOF'
set -e
REPO_PATH=/home/189/jb4202/model-release-condaenv
export ADMIN_DIR="/g/data/tm70/jb4202/tmp-conda/admin/conda_containers"
export CONDA_BASE="/g/data/tm70/jb4202/tmp-conda/prerelease"
export APPS_USERS_GROUP="tm70"
export APPS_OWNERS_GROUP="tm70"

source "$REPO_PATH/scripts/install_config.sh"
source "$REPO_PATH/scripts/functions.sh"
mkdir -p "${ADMIN_DIR}" "${JOB_LOG_DIR}" "${BUILD_STAGE_DIR}"
set_admin_perms "${ADMIN_DIR}" "${JOB_LOG_DIR}" "${BUILD_STAGE_DIR}"

echo "${ADMIN_DIR}" "${CONDA_BASE}" "${JOB_LOG_DIR}"
echo "Finished setup!"
EOF
Build command
bash << 'EOF'
set -e
REPO_PATH=/home/189/jb4202/model-release-condaenv
export SCRIPT_DIR="$REPO_PATH/scripts"
export CONDA_ENVIRONMENT="payu"
export ADMIN_DIR="/g/data/tm70/jb4202/tmp-conda/admin/conda_containers"
export CONDA_BASE="/g/data/tm70/jb4202/tmp-conda/prerelease"
export APPS_USERS_GROUP="tm70"
export APPS_OWNERS_GROUP="tm70"
PROJECT="tm70"
STORAGE="gdata/tm70"

source "${SCRIPT_DIR}"/install_config.sh
cd "${JOB_LOG_DIR}"

qsub -N build_"${CONDA_ENVIRONMENT}" -lncpus=1,mem=20GB,walltime=2:00:00,jobfs=50GB,storage="${STORAGE}" \
           -v SCRIPT_DIR,CONDA_ENVIRONMENT,ADMIN_DIR,CONDA_BASE,APPS_USERS_GROUP,APPS_OWNERS_GROUP \
           -P "${PROJECT}" -q copyq -Wblock=true -Wumask=037 \
           "${SCRIPT_DIR}"/build.sh

echo "Finished Build!"
EOF
Test command
bash << 'EOF'
set -e
REPO_PATH=/home/189/jb4202/model-release-condaenv
export SCRIPT_DIR="$REPO_PATH/scripts"
export CONDA_ENVIRONMENT="payu"
export ADMIN_DIR="/g/data/tm70/jb4202/tmp-conda/admin/conda_containers"
export CONDA_BASE="/g/data/tm70/jb4202/tmp-conda/prerelease"
export APPS_USERS_GROUP="tm70"
export APPS_OWNERS_GROUP="tm70"
PROJECT="tm70"
STORAGE="gdata/tm70"

source "${SCRIPT_DIR}"/install_config.sh
cd "${JOB_LOG_DIR}"

qsub -N test_"${CONDA_ENVIRONMENT}" -lncpus=4,mem=20GB,walltime=0:20:00,jobfs=50GB,storage="${STORAGE}" \
           -v SCRIPT_DIR,CONDA_ENVIRONMENT,ADMIN_DIR,CONDA_BASE,APPS_USERS_GROUP,APPS_OWNERS_GROUP \
           -P "${PROJECT}" -Wblock=true -Wumask=037 \
           "${SCRIPT_DIR}"/test.sh

echo "Finished Test!"
EOF
Deploy command
bash << 'EOF'
set -e
REPO_PATH=/home/189/jb4202/model-release-condaenv
export SCRIPT_DIR="$REPO_PATH/scripts"
export CONDA_ENVIRONMENT="payu"
export ADMIN_DIR="/g/data/tm70/jb4202/tmp-conda/admin/conda_containers"
export CONDA_BASE="/g/data/tm70/jb4202/tmp-conda/prerelease"
export APPS_USERS_GROUP="tm70"
export APPS_OWNERS_GROUP="tm70"

source "${SCRIPT_DIR}"/install_config.sh

"${SCRIPT_DIR}"/deploy.sh

echo "Finished Deploy!"
EOF

Once everything was deployed, I tested modules with manually running the configuration repro tests (instructions here: https://github.com/ACCESS-NRI/model-config-tests/?tab=readme-ov-file#how-to-run-pytests-manually-on-nci), with module load conda/payu-1.1.5. (Tested an ACCESS-OM2 Configuration (tag: release-1deg_jra55_ryf-2.0) and an ACCESS-ESM1.5 configuration (tag: release-historical+concentrations-1.1)). Also tested the payu commands run fine when running directly. Similar to the above with testing payu-dev environments.

With the workflows, in pull_request.yml, I've tested the build_base_image job which builds the container.sif and upload/download artefact on a private test repository. What has not been tested really is the Github vars are all correctly set and used. A test organisation probably wouldn't be a bad idea to check those.

One thing that should be edited if deployed to Gadi, should be the secrets.REPO_PATH to maybe some temporary directory in CI user home directory or scratch (If scratch the storage flags (vars.STORAGE) for pbs jobs might need to include those.

… matrix

- Fixed matrix to include changed environments that are substrings of other (e.g. payu and payu-dev)
This is to reduce the number of signoffs required for pull request and deploy jobs, so it's just once per modified environment
@jo-basevi
Copy link
Collaborator Author

I've been testing the CI/CD workflows on a separate test organisation repository (https://github.com/jbcv-test-org/test-repository/actions). This includes:

  • Pull request workflow (which builds a container on a github runner, syncs the repository down to gadi, runs the build and test scripts which builds a squashfs file for the conda environment and creates launcher scripts and module files, and tars the files to a staging directory on gadi)
  • Deployment once PR is merged (which untars the files in the staging directory and rsyncs them to the released location)
  • Manual update (runs build, test and deploy steps). I tested this with manually running the workflow from a branch that had package versions from earlier payu conda-lock file (though maybe would need to use conda-lock tool to fully replicate the old environments in future)

Some new code changes:

  • Added a fix so both payu and payu-dev is picked up in changed environments workflows (get_changed_env.yml). Added a small simplification to built matrix to be a list rather than a dictionary.
  • Removed default payu module alias for module load conda_container module. There's still a payu alias so module load conda_container/payu would load conda_container/payu-1.1.5
  • Merged workflows that run on environment into one job to avoid multiple deployment to environment requests (as now using Github Environment for configuration settings for gadi). It feels like a bit of a crime to merge the workflows that has been nicely factored into setup, build, test and deploy workflows. But this is to prevent requiring multiple signoffs for each job (e.g. this PR that modifies payu and payu-dev environments now requires two signoffs for pull-requests (one per environment) vs. five signoffs (setup, build payu, build payu-dev, test payu, test payu-dev))
  • Note: The secrets.REPO_PATH (Directory path to sync repository to which is used for build/test/deploy scripts) needs to be on either the /home/ directory or scratch, as during the build scripts, the /g directory is mounted to a /jobfs directory so the scripts are not accessible.

@aidanheerdegen
Copy link
Member

  • Removed default payu module alias for module load conda_container module. There's still a payu alias so module load conda_container/payu would load conda_container/payu-1.1.5

I'm not a fan of the verbosity of this. Can we just module load payu like before?

@aidanheerdegen
Copy link
Member

APPS_OWNERS_GROUP: vk83_w ? (Read/write/executable permissions for installed files)

I'd recommend tm70_ci for all the normal reasons, but especially so with conda environments, where it is so easy to conda install something thinking you're in a different environment, and if you have write permissions ... well ... enough said.

Copy link
Member

@aidanheerdegen aidanheerdegen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. It is pretty damn complicated set of interconnected scripts etc. Give it a burl and see how it runs ... but of course I still have questions.

export CONDA_TEMP_PATH="${PBS_JOBFS:-${CONDA_TEMP_PATH}}"
export SCRIPT_DIR="${SCRIPT_DIR:-$PWD}"

export SCRIPT_SUBDIR="apps/cms_conda_scripts"
export SCRIPT_SUBDIR="apps/conda_scripts"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a path in the container? If so it would be good to have a comment to that effect. I get quite lost with all the paths etc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this script directory sits outside container, and contains the environment launcher scripts (for every file on $PATH inside the squashfs environment) that launches a container and runs commands inside the containerised environment. I've added some brief documentation to this file in b4d57b9

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the same as the payu-dev version. Is there a reason for that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, so payu-dev has a pip install of payu, so payu entry point scripts (e.g. payu-run, payu-collate) have incorrect python shebang headers (point to location on /jobfs/ where the environments were built). Technically payu/build_inner.sh could be changed to the same as payu-dev/build_inner.sh as changing the python headers should have no effect.

@jo-basevi
Copy link
Collaborator Author

I'm not a fan of the verbosity of this. Can we just module load payu like before?

Yeah I agree that it is overly verbose- I was just naming it conda_container to separate it from the other conda-pack payu environments currently. Though figuring out some consistency in naming would be good. The MED conda environments are in /g/data/xp65/public/modules, and follow a naming scheme similar to hh5's environments e.g. conda/access-med-0.10 and conda/esmvaltool-0.4. There's a pre-existing conda directory in /g/data/vk83/modules which is currently used for the access-ram environments, so could use the same directory though they are also follow a slightly different naming scheme - conda/access-ram/2024.11.1. So could have conda/payu/VERSION, however then it'll be mixing squashfs conda environments and the conda-packed environments. Though would the idea be to eventually house environment files for various model release conda environments in this repository? A pro of keeping conda container modules together in a conda subdirectory, would that any updates to common modulefiles (.common_v3) are set together.

To use module load payu, I think it would require some changes to the $CONDA_MODULE_PATH used in the build scripts that will modify it using the environment module name - maybe the environment/payu/config.sh, and then maybe some changes to the common module file that extracts the module name and uses it find the conda-squashfs file and environment inside the container.

- Add general modulepath config overrides to payu environment config file (environments/payu/config.sh)
- Update payu deploy script to use these modulepaths
- Add a MODULE_VERSION to use in the general build scripts (separate from FULLENV which is the name of the environment in the container and squashfs files)
- Extend common modulefile to support modulefile names $ENVIRONMENT/$VERSION, as well as conda/$ENIRONMENT-$VERSION
@jo-basevi
Copy link
Collaborator Author

To setup use module load payu/module load payu/$VERSION (it was previously module load conda_container/payu/module load conda_container/payu-$VERSION), I've:

  • over-ridden general module paths configuration for payu in the environment configuration (environments/payu/config.sh)
  • modified the custom deploy script (environments/payu/deploy.sh) and general build script (scripts/build.sh) to handle a module version that isn't payu-$VERSION
  • extended the common modulefile to parse the module name payu/$VERSION to obtain payu-$VERSION conda environment name

I've kept the general modulename configuration (e.g. $MODULE_NAME, $CONDA_MODULE_PATH) in scripts/install_config.sh and in the common modulefile to allow for modules in future environments to be grouped under a conda sub-directory. I'm a bit worried that I'm just making the code more complicated than needed..

I've removed payu/dev environment so it doesn't clash with the existing payu/dev module in pre-release. Once automatic updates is implemented (in this issue: #2), this environment can be added back in and will replace the pre-existing payu/dev environment.

I think once there's a new version of payu (1.1.6?), it would be great to release it to prerelease for testing. I've re-done some configuration checks with the latest changes to confirm that at least for an ACCESS-ESM1.5 configuration, the containerised payu reproduces a model run.

@CodeGat CodeGat self-requested a review December 16, 2024 23:53
@jo-basevi
Copy link
Collaborator Author

I've noticed the deploy rsync command changes to ACLs and permissions on the /apps subdirectory ($APPS_SUBDIR). This is due to permissions and ACLs being set on files on {CONDA_TEMP_PATH}, and these get rsynced across with permissions and acls preserved (--perms is included in --archive, and --acls), e.g. one of the rsync commands in scripts/deploy.sh:

rsync --archive --verbose --partial --progress --one-file-system --itemize-changes --hard-links --acls --relative -- "${CONDA_TEMP_PATH}"/./"${APPS_SUBDIR}"/"${CONDA_INSTALL_BASENAME}" "${CONDA_TEMP_PATH}"/./"${MODULE_SUBDIR}" "${CONDA_TEMP_PATH}"/./"${SCRIPT_SUBDIR}" "${CONDA_BASE}"

In testing on a separate directory, I tried to have similar ACLs and permissions to /g/data/vk83/prerelease with an pre-existing apps directory and changed the rsync commands to remove --perms, --group, --acls

rsync --archive --no-perms --no-group --no-owner  --verbose --partial --progress --one-file-system --itemize-changes --hard-links --relative -- "${CONDA_TEMP_PATH}"/./"${APPS_SUBDIR}"/"${CONDA_INSTALL_BASENAME}" "${CONDA_TEMP_PATH}"/./"${MODULE_SUBDIR}" "${CONDA_TEMP_PATH}"/./"${SCRIPT_SUBDIR}" "${CONDA_BASE}"

The above seems to allow ACLs and permissions to be inherited from pre-existing /apps subdirectory. The pre-existing apps directory ACL (/g/data/vk83/prerelease/apps) does allow write access to vk83_w. Referencing Aidans previous comment:

APPS_OWNERS_GROUP: vk83_w ? (Read/write/executable permissions for installed files)

I'd recommend tm70_ci for all the normal reasons, but especially so with conda environments, where it is so easy to conda install something thinking you're in a different environment, and if you have write permissions ... well ... enough said.

It shouldn't be possible to install anything in the conda environment as it's a squashfs file so it should be read only.
Though to have more restrictive ACLs and for rsync to preserve acls and permissions, I could break up the rsyncs to target apps and modules sub-directories, e.g. the above rsync could be something like the following

rsync --archive --verbose --partial --progress --one-file-system --itemize-changes --hard-links --acls --relative -- "${CONDA_TEMP_PATH}"/"${APPS_SUBDIR}"/./"${CONDA_INSTALL_BASENAME}" "${CONDA_TEMP_PATH}"/"${APPS_SUBDIR}"/./${SCRIPT_SUBDIR} "${CONDA_BASE}"/"${APPS_SUBDIR}"

rsync --archive --verbose --partial --progress --one-file-system --itemize-changes --hard-links --acls --relative  -- "${CONDA_TEMP_PATH}"/"${MODULE_SUBDIR}"/./"${MODULE_NAME}" "${CONDA_BASE}"/"${MODULE_SUBDIR}"

Does anyone have any preferences on with stripping out ACLs and permissions completely and using pre-existing ACLs or manually adding more restrictive acls, or breaking up rsyncs to preserve ACLs and permissions only on directories that are relevant to the conda installs?

Copy link
Member

@CodeGat CodeGat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve the change to APPS_OWNER over APPS_OWNERS_GROUP, given tm70_ci is probably going to be the point of contact for this stuff.

@jo-basevi jo-basevi merged commit 378e8f9 into main Dec 19, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants