A set of scripts and tools for the analysis of viral NGS data.
Workflows are written in WDL format. This is a portable workflow language that allows for easy execution on a wide variety of platforms:
- on individual machines (using miniWDL or Cromwell to execute)
- on commercial cloud platforms like GCP, AWS, or Azure (using Cromwell or CromwellOnAzure)
- on institutional HPC systems (using Cromwell)
- on commercial platform as a service vendors (like DNAnexus)
- on academic cloud platforms (like Terra)
Workflows from this repository are continuously deployed to Dockstore, a GA4GH Tool Repository Service. They can then be easily imported to any bioinformatic compute platform that utilizes the TRS API and understands WDL (this includes Terra, DNAnexus, DNAstack, etc).
Flattened workflows are also continuously deployed to a staging github repo viral-ngs-staging and a GCS bucket: gs://viral-ngs-wdl and can be downloaded for local use.
Workflows are also available in the Terra featured workspace.
Workflows are continuously deployed to a DNAnexus CI project.
The easiest way to get started is on a single, Docker-capable machine (your laptop, shared workstation, or virtual machine) using miniWDL. MiniWDL can be installed either via pip
or conda
(via conda-forge). After confirming that it works (miniwdl run_self_test
, you can use miniwdl run to invoke WDL workflows from this repository.
For example, to list the inputs for the assemble_refbased workflow:
miniwdl run https://raw.githubusercontent.com/broadinstitute/viral-ngs-staging/master/pipes/WDL/workflows/assemble_refbased.wdl
This will emit:
missing required inputs for assemble_refbased: reads_unmapped_bams, reference_fasta
required inputs:
Array[File]+ reads_unmapped_bams
File reference_fasta
optional inputs:
<really long list>
outputs:
<really long list>
To then execute this workflow on your local machine, invoke it with like this:
miniwdl run \
https://raw.githubusercontent.com/broadinstitute/viral-ngs-staging/master/pipes/WDL/workflows/assemble_refbased.wdl \
reads_unmapped_bams=PatientA_library1.bam \
reads_unmapped_bams=PatientA_library2.bam \
reference_fasta=/refs/NC_045512.2.fasta \
trim_coords_bed=/refs/NC_045512.2-artic_primers-3.bed \
sample_name=PatientA \
In the above example, reads from two sequencing runs are aligned and merged together before consensus calling. The optional bed file provided turns on primer trimming at the given coordinates.
The workflows provided here are more fully documented at our ReadTheDocs page.