This repository will contain processing code for GENIE sponsored projects used to for creating cBioPortal files
-
Close the repository and navigate to the
READMD.md
in your local directory -
Build the environment using
pip install -e .
-
Install Docker: https://docs.docker.com/get-docker/
-
Clone this repository and navigate to the directory
git clone [email protected]:Sage-Bionetworks/GENIE-Sponsored-Projects.git
cd GENIE-Sponsored-Projects
- Build the container locally.
docker build -t geniesp .
Cache your Synapse personal access token (PAT) as an environmental variable:
export SYNAPSE_AUTH_TOKEN={your_personal_access_token_here}
To view usage details, run
docker run --rm geniesp -h
Output will be as follows
usage: geniesp [-h] [--staging]
{NSCLC,CRC,BrCa,PANC,Prostate,AKT1,ERRB2,FGFR4} release
Run GENIE sponsored projects
positional arguments:
{NSCLC,CRC,BrCa,PANC,Prostate,AKT1,ERRB2,FGFR4}
Specify project to run
release Specify bpc release (e.g. 1.1-consortium)
optional arguments:
-h, --help Show this help message and exit
--upload Upload files into Synapse BPC staging directory. Default: false
--cbioportal {synapseID}
Optional parameter to specify cbioportal folder
location
--production Whether to run in production mode or not. Default: false
--use-grs Whether to use grs or use dd as primary mapping. Default: false
Example command line:
This runs the release pipeline for BLADDER 1.1 in non-production mode (staging) with GRS enabled.
geniesp BLADDER 1.1-consortium --upload --use-grs
Example command using docker:
This runs the release pipeline for PANC 1.1 in non-production mode (staging).
docker run --rm -e SYNAPSE_AUTH_TOKEN=$SYNAPSE_AUTH_TOKEN geniesp geniesp PANC 1.1-consortium --upload
To validate a cBioPortal mapping file stored on synapse:
python validate_map.py -s {synapse_id} -c {cohort} -r {release} -l info
or stored in a local file:
python validate_map.py -f {/path/to/file.csv} -c {cohort} -r {release} -l info
To view full usage details:
python validate_map.py -h
which outputs:
usage: validate_map.py [-h] [--synapse_id SYNAPSE_ID | --file FILE] [--version VERSION] [--cohort {BLADDER,BRCA,CRC,NSCLC,PANC,PROSTATE}]
[--release {1.1-consortium,1.2-consortium,2.0-public,2.1-consortium}] [--outfile OUTFILE] [--log {debug,info,warning,error}]
Checks validity of BPC to cBioPortal mapping file
optional arguments:
-h, --help show this help message and exit
--synapse_id SYNAPSE_ID, -s SYNAPSE_ID
Synapse ID of mapping file
--file FILE, -f FILE Local path to mapping file
--version VERSION, -v VERSION
Synapse entity version number (default: current)
--cohort {BLADDER,BRCA,CRC,NSCLC,PANC,PROSTATE}, -c {BLADDER,BRCA,CRC,NSCLC,PANC,PROSTATE}
BPC cohort label (default: BLADDER)
--release {1.1-consortium,1.2-consortium,2.0-public,2.1-consortium}, -r {1.1-consortium,1.2-consortium,2.0-public,2.1-consortium}
Release label (default: 1.1-consortium)
--outfile OUTFILE, -o OUTFILE
Name of output file (default: output.csv)
--log {debug,info,warning,error}, -l {debug,info,warning,error}
Set logging output level (default: error)
The most common issues when running GENIE-Sponsored-Projects code for BPC involve changes to variable names of the underlying source data and outdated or incorrect references.
Variable references
- syn22294851: the Scope of Release maintains a running log of changes to derived variable names
- syn17011602: curated variable names are listed in data dictionaries for each cohort
If obtaining an error that a variable cannot be found, check and update the following references on Synapse:
- syn25712693: BPC REDCap to cbio mapping
- syn22296821: Dataset labels in Data files for derived variables
On rare occassions, variable name changes may also require changes in the codebase:
- Check for any hardcoded variable names have been updated
Finally, some information is not collected for particular cohorts.
- syn20852283: check if data is collected for a given cohort by investigating raw file uploads