Skip to content

Commit

Permalink
Merge pull request nf-core#505 from nf-core/refactor-references
Browse files Browse the repository at this point in the history
Refactor references
  • Loading branch information
rannick authored Dec 20, 2024
2 parents a815895 + 31b400a commit 31d29aa
Show file tree
Hide file tree
Showing 96 changed files with 2,021 additions and 1,204 deletions.
17 changes: 0 additions & 17 deletions .github/workflows/awsfulltest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,6 @@ jobs:
"cosmic_username": "${{ secrets.cosmic_username }}",
"cosmic_passwd": "${{ secrets.cosmic_passwd }}",
"all": true,
"build_references": true
}
profiles: test_full,aws_tower
- uses: actions/upload-artifact@v4
Expand All @@ -55,19 +54,3 @@ jobs:
path: |
seqera_platform_action_*.log
seqera_platform_action_*.json
- name: Launch run workflow via tower
uses: seqeralabs/action-tower-launch@v2
with:
workspace_id: ${{ secrets.TOWER_WORKSPACE_ID }}
access_token: ${{ secrets.TOWER_ACCESS_TOKEN }}
compute_env: ${{ secrets.TOWER_COMPUTE_ENV }}
workdir: s3://${{ secrets.AWS_S3_BUCKET }}/work/rnafusion/work-${{ github.sha }}
parameters: |
{
"outdir": "s3://${{ secrets.AWS_S3_BUCKET }}/rnafusion/results-${{ github.sha }}",
"genomes_base": "s3://${{ secrets.AWS_S3_BUCKET }}/rnafusion/results-${{ github.sha }}/references",
"cosmic_username": "${{ secrets.cosmic_username }}",
"cosmic_passwd": "${{ secrets.cosmic_passwd }}",
"all": true,
}
profiles: test_full,aws_tower
27 changes: 1 addition & 26 deletions .github/workflows/awstest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,7 @@ jobs:
"cosmic_username": "${{ secrets.cosmic_username }}",
"cosmic_passwd": "${{ secrets.cosmic_passwd }}",
"all": true,
"stub": true,
"build_references": true
"stub": true
}
profiles: test,aws_tower
- uses: actions/upload-artifact@v4
Expand All @@ -36,27 +35,3 @@ jobs:
path: |
tower_action_*.log
tower_action_*.json
- name: Launch workflow via tower
uses: seqeralabs/action-tower-launch@v2
with:
workspace_id: ${{ secrets.TOWER_WORKSPACE_ID }}
access_token: ${{ secrets.TOWER_ACCESS_TOKEN }}
compute_env: ${{ secrets.TOWER_COMPUTE_ENV }}
workdir: s3://${{ secrets.AWS_S3_BUCKET }}/work/rnafusion/work-${{ github.sha }}
parameters: |
{
"outdir": "s3://${{ secrets.AWS_S3_BUCKET }}/rnafusion/results-${{ github.sha }}",
"genomes_base": "s3://${{ secrets.AWS_S3_BUCKET }}/rnafusion/results-${{ github.sha }}/references",
"cosmic_username": "${{ secrets.cosmic_username }}",
"cosmic_passwd": "${{ secrets.cosmic_passwd }}",
"all": true,
"stub": true
}
profiles: test,aws_tower
- uses: actions/upload-artifact@v4
with:
name: Seqera Platform debug log file
path: |
seqera_platform_action_*.log
seqera_platform_action_*.json
1 change: 0 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,6 @@ jobs:
- "latest-stable"
test_profile:
- "test_stub"
- "test_build"
compute_profile:
- "docker"
- "singularity"
Expand Down
14 changes: 11 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Add nf-test to local module: `STARFUSION_BUILD`. [#585](https://github.com/nf-core/rnafusion/pull/585)
- Add nf-test to local module: `STARFUSION_DETECT`. [#586](https://github.com/nf-core/rnafusion/pull/586)
- Added a new module `CTATSPLICING_STARTOCANCERINTRONS` and a new parameter `--ctatsplicing`. This options creates reports on cancer splicing abberations and requires one or both of `--arriba` and `--starfusion` to be given. [#587](https://github.com/nf-core/rnafusion/pull/587)
- Add parameter `--references_only` when no data should be analysed, but only the references should be built [#505](https://github.com/nf-core/rnafusion/pull/505)

### Changed

Expand All @@ -34,6 +35,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Remove double nested folder introduced in [#577](https://github.com/nf-core/rnafusion/pull/577), [#581](https://github.com/nf-core/rnafusion/pull/581)
- Use docker.io and galaxy containers for fusioncatcher and starfusion (incl. fusioninspector) instead of wave as they are not functional on wave [#588](https://github.com/nf-core/rnafusion/pull/588)
- Update STAR-Fusion to 1.14 [#588](https://github.com/nf-core/rnafusion/pull/588)
- Use "-genePredExt -geneNameAsName2 -ignoreGroupsWithoutExons" (to mimic gms/tomte) for GTF_TO_REFFLAT [#505](https://github.com/nf-core/rnafusion/pull/505)
- Integrate reference building in the main workflow [#505](https://github.com/nf-core/rnafusion/pull/505)
- Move from ensembl to gencode base [#505](https://github.com/nf-core/rnafusion/pull/505)
- Update from ensembl 102 to gencode 46 default references [#505](https://github.com/nf-core/rnafusion/pull/505)

### Fixed

Expand All @@ -48,12 +53,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Removed

- Remove fusionGDB from documentation and fusion-report download stubs [#503](https://github.com/nf-core/rnafusion/pull/503)
- Removed test-build as reference building gets integrated in the main workflow [#505](https://github.com/nf-core/rnafusion/pull/505)
- Removed parameter `--build_references`

### Parameters

| Old parameter | New parameter |
| ------------- | ------------- |
| | `--no_cosmic` |
| Old parameter | New parameter |
| -------------------- | ------------------- |
| | `--no_cosmic` |
| `--build_references` | `--references_only` |

## v3.0.2 - [2024-04-10]

Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@

## Introduction

**nf-core/rnafusion** is a bioinformatics best-practice analysis pipeline for RNA sequencing consisting of several tools designed for detecting and visualizing fusion genes. Results from up to 5 fusion callers tools are created, and are also aggregated, most notably in a pdf visualiation document, a vcf data collection file, and html and tsv reports.
**nf-core/rnafusion** is a bioinformatics best-practice analysis pipeline for RNA sequencing consisting of several tools designed for detecting and visualizing fusion genes. Results from up to 5 fusion callers tools are created, and are also aggregated, most notably in a pdf visualisation document, a vcf data collection file, and html and tsv reports.

On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources. The results obtained from the full-sized test can be viewed on the [nf-core website](https://nf-co.re/rnafusion/results).

Expand All @@ -31,9 +31,9 @@ In rnafusion the full-sized test includes reference building and fusion detectio

### Build references

`--build_references` triggers a parallel workflow to build references, which is a prerequisite to running the pipeline:
`--references_only` triggers a workflow to ONLY build references, otherwise the references are build when the analysis is run:

1. Download ensembl fasta and gtf files
1. Download gencode fasta and gtf files
2. Create [STAR](https://github.com/alexdobin/STAR) index
3. Download [Arriba](https://github.com/suhrig/arriba) references
4. Download [FusionCatcher](https://github.com/ndaniel/fusioncatcher) references
Expand Down Expand Up @@ -78,7 +78,7 @@ First, build the references:
nextflow run nf-core/rnafusion \
-profile test,<docker/singularity/.../institute> \
--outdir <OUTDIR>\
--build_references \
--references_only \
-stub
```

Expand Down
74 changes: 42 additions & 32 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,6 @@ process {
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]

withName: GFFREAD {
ext.args = '-w -S'
}

withName: 'ARRIBA_ARRIBA' {
publishDir = [
path: { "${params.outdir}/arriba" },
Expand All @@ -40,7 +36,7 @@ process {
}

withName: 'ARRIBA_VISUALISATION' {
ext.when = { !params.fusioninspector_only && (params.starfusion || params.all) }
ext.when = { {!params.fusioninspector_only} && ({params.starfusion} || {params.all}) }
ext.prefix = { "${meta.id}_combined_fusions_arriba_visualisation" }
publishDir = [
path: { "${params.outdir}/arriba_visualisation" },
Expand Down Expand Up @@ -73,9 +69,9 @@ process {
]
}

withName: 'ENSEMBL_DOWNLOAD' {
withName: 'GENCODE_DOWNLOAD' {
publishDir = [
path: { "${params.genomes_base}/ensembl" },
path: { "${params.genomes_base}/gencode" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
Expand All @@ -87,7 +83,7 @@ process {

withName: 'FASTQC' {
ext.args = '--quiet'
ext.when = { !params.skip_qc }
ext.when = {!params.skip_qc}
publishDir = [
path: { "${params.outdir}/fastqc" },
mode: params.publish_dir_mode,
Expand All @@ -97,6 +93,7 @@ process {

withName: 'FASTQC_FOR_FASTP' {
ext.args = '--quiet'
ext.when = { !params.skip_qc }
ext.prefix = { "${meta.id}_trimmed" }
publishDir = [
path: { "${params.outdir}/fastqc_for_fastp" },
Expand All @@ -119,7 +116,7 @@ process {

withName: 'FUSIONINSPECTOR' {
ext.when = { !params.skip_vis }
ext.args = { params.fusioninspector_limitSjdbInsertNsj != 1000000 ? "--STAR_xtra_params \"--limitSjdbInsertNsj ${params.fusioninspector_limitSjdbInsertNsj}\"" : '' }
ext.args = { ${params.fusioninspector_limitSjdbInsertNsj} != 1000000 ? "--STAR_xtra_params \"--limitSjdbInsertNsj ${params.fusioninspector_limitSjdbInsertNsj}\"" : '' }
ext.args2 = '--annotate --examine_coding_effect'
}

Expand All @@ -146,12 +143,39 @@ process {

withName: 'GATK4_BEDTOINTERVALLIST' {
publishDir = [
path: { "${params.genomes_base}/ensembl" },
path: { "${params.genomes_base}/gencode" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: 'GATK4_MARKDUPLICATES' {
ext.when = { {!params.skip_qc} && {!params.fusioninspector_only} && ( {params.starfusion}|| {params.all}) }
publishDir = [
path: { "${params.outdir}/picard" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
]
}

withName: 'GFFREAD' {
ext.args = { '-w -S' }
publishDir = [
path: { "${params.genomes_base}/gffread" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
]
}

withName: 'GTF_TO_REFFLAT' {
ext.args = "-genePredExt -geneNameAsName2 -ignoreGroupsWithoutExons"
publishDir = [
path: { "${params.genomes_base}/gencode" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
]
}

withName: 'HGNC_DOWNLOAD' {
publishDir = [
path: { "${params.genomes_base}/hgnc" },
Expand All @@ -161,7 +185,7 @@ process {
}
withName: 'MULTIQC' {
ext.when = { !params.skip_qc }
ext.args = params.multiqc_title ? "--title \"$params.multiqc_title\"" : ''
ext.args = {params.multiqc_title} ? "--title \"$params.multiqc_title\"" : ''
publishDir = [
path: { "${params.outdir}/multiqc" },
mode: params.publish_dir_mode,
Expand All @@ -170,21 +194,12 @@ process {
}

withName: 'PICARD_COLLECTRNASEQMETRICS' {
ext.when = { !params.skip_qc && !params.fusioninspector_only && (params.starfusion || params.all) }
ext.when = { {!params.skip_qc} && {!params.fusioninspector_only} && ( {params.starfusion} || {params.all}) }

}

withName: 'GATK4_MARKDUPLICATES' {
ext.when = { !params.skip_qc && !params.fusioninspector_only && (params.starfusion || params.all) }
publishDir = [
path: { "${params.outdir}/picard" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
]
}

withName: 'PICARD_COLLECTINSERTSIZEMETRICS' {
ext.when = { !params.skip_qc && !params.fusioninspector_only && (params.starfusion || params.all) }
ext.when = { ${!params.skip_qc} && ${!params.fusioninspector_only} && (${params.starfusion} || ${params.all}) }
ext.prefix = { "${meta.id}_collectinsertsize"}
publishDir = [
path: { "${params.outdir}/picard" },
Expand Down Expand Up @@ -215,7 +230,7 @@ process {

withName: 'SAMTOOLS_FAIDX' {
publishDir = [
path: { "${params.genomes_base}/ensembl" },
path: { "${params.genomes_base}/gencode" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
]
Expand Down Expand Up @@ -375,16 +390,11 @@ process {
]
}

withName: 'UCSC_GTFTOGENEPRED' {
ext.args = "-genePredExt -geneNameAsName2"
publishDir = [
path: { "${params.genomes_base}/ensembl" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
]
}

withName: 'VCF_COLLECT' {
ext.when = { {!params.fusioninspector_only} && {!params.skip_vcf} }
}

withName: '.*' {
ext.when = { !params.references_only || task.process.contains('BUILD_REFERENCES') }
}
}
2 changes: 2 additions & 0 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ params {

// Input data
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/rnafusion/testdata/human/samplesheet_valid.csv'
all = true
no_cosmic = true
}

// Limit and standardize resources for github actions and reproducibility
Expand Down
2 changes: 1 addition & 1 deletion conf/test_build.config
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ params {
config_profile_description = 'Minimal test dataset to check pipeline function'

// Input data
build_references = true
references_only = true
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/rnafusion/testdata/human/samplesheet_valid.csv'
no_cosmic = true
all = true
Expand Down
15 changes: 7 additions & 8 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ The pipeline is divided into two parts:

1. Download and build references

- specified with `--build_references` parameter
- specified with `--references_only` parameter
- required only once before running the pipeline
- **Important**: has to be run with each new release

Expand All @@ -32,7 +32,7 @@ The rnafusion pipeline needs references for the fusion detection tools, so downl
```bash
nextflow run nf-core/rnafusion \
-profile <docker/singularity/.../institute> \
--build_references --all \
--references_only --all \
--cosmic_username <EMAIL> --cosmic_passwd <PASSWORD> \
--genomes_base <PATH/TO/REFERENCES> \
--outdir <PATH/TO/REFERENCES>
Expand All @@ -43,7 +43,7 @@ References for each tools can also be downloaded separately with:
```bash
nextflow run nf-core/rnafusion \
-profile <docker/singularity/.../institute> \
--build_references --<tool1> --<tool2> ... \
--references_only --<tool1> --<tool2> ... \
--cosmic_username <EMAIL> --cosmic_passwd <PASSWORD> \
--genomes_base <PATH/TO/REFERENCES> \
--outdir <OUTPUT/PATH>
Expand All @@ -64,7 +64,7 @@ Use credentials from QIAGEN and add `--qiagen`
```bash
nextflow run nf-core/rnafusion \
-profile <docker/singularity/.../institute> \
--build_references --<tool1> --<tool2> ... \
--references_only --<tool1> --<tool2> ... \
--cosmic_username <EMAIL> --cosmic_passwd <PASSWORD> \
--genomes_base <PATH/TO/REFERENCES> \
--outdir <OUTPUT/PATH> --qiagen
Expand All @@ -81,7 +81,7 @@ If process `FUSIONREPORT_DOWNLOAD` times out, it could be due to network restric
```bash
nextflow run nf-core/rnafusion \
-profile <docker/singularity/.../institute> \
--build_references \
--references_only \
--cosmic_username <EMAIL> --cosmic_passwd <PASSWORD> \
--fusionreport \
--genomes_base <PATH/TO/REFERENCES> \
Expand All @@ -93,7 +93,7 @@ Where the custom configuration could look like (adaptation to local machine nece

```text
process {
withName: 'NFCORE_RNAFUSION:BUILD_REFERENCES:FUSIONREPORT_DOWNLOAD' {
withName: 'NFCORE_RNAFUSION:RNAFUSION:BUILD_REFERENCES:FUSIONREPORT_DOWNLOAD' {
memory = '8.GB'
cpus = 4
}
Expand Down Expand Up @@ -162,7 +162,7 @@ If you are not covered by the research COSMIC license and want to avoid using CO

> **IMPORTANT: Either `--all` or `--<tool>`** is necessary to run detection tools
`--genomes_base` should be the path to the directory containing the folder `references/` that was built with `--build_references`.
`--genomes_base` should be the path to the directory containing the folder `references/` that was built with `--references_only`.

Note that the pipeline will create the following files in your working directory:

Expand Down Expand Up @@ -397,7 +397,6 @@ If `-profile` is not specified, the pipeline will run locally and expect all sof
- `test`
- A profile with a complete configuration for automated testing
- Includes links to test data so needs no other parameters
- Needs to run in two steps: with `--build_references` first and then without `--build_references` to run the analysis
- !!!! Run with `-stub` as all references need to be downloaded otherwise !!!!

### `-resume`
Expand Down
Loading

0 comments on commit 31d29aa

Please sign in to comment.