Skip to content


Repository files navigation


This is a Snakemake pipeline for analyzing unpaired fungal internal transcribed spacer (ITS) sequences


To install, we assume you already have installed Miniconda3 (

  • Clone the repository:
git clone
  • Create a conda environment and install the required packages:
cd PCMP_ITS_pipeline
conda create -n PCMP_ITS_pipeline --channel bioconda --channel conda-forge --channel defaults python=3.10
conda install --name PCMP_ITS_pipeline --file requirements.txt

Required input files for the pipeline

To run the pipeline, we need

  • De/Multiplexed Illumina reads

How to run

  • Create a project directory, e.g. /scr1/users/tuv/ITS_Run1
  • Copy the files from this repository into that directory
  • Edit project_config.yml so that it suits your project. In particular,
    • all: project_dir: Path to the project directory, e.g. "/scr1/users/tuv/ITS_Run1"
    • all: mux_dir: Directory containing multiplexed Illumina sequencing reads, which does not have to be in the project directory, e.g. "/path/to/mux_files"; if samples are already demultiplexed, just fill in demux_dir
    • all: demux_dir: Leave blank if want to demultiplex using this pipeline; otherwise, the directory containing demultiplexed R1/R2 read pairs, which does not have to be in the project directory
    • all: mapping_file: Mapping file of samples with barcode information for demultiplexing
    • all: forward_direction: TRUE/FALSE for using forward/reverse read for this pipeline
    • demux: mismatch: Number of allowable basepair mismatches on barcode sequence for demultiplexing
    • demux: revcomp: If TRUE, reverse complement barcode sequence before demultiplexing
    • trim: f_primer: Sequence of forward primer used for ITS PCR
    • trim: r_primer: Sequence of reverse primer used for ITS PCR
    • trim: mismatch: Number of allowable basepair mismatches on ITS PCR primers for trimming
    • trim: min_length: Minimum length of match during the partial matching stage
    • trim: align_id: Minimum percent identity to consider a primer match in vsearch alignment
    • otu: expected_error: Threshold for truncating reads
    • otu: otu_id: Percent sequence identity for clustering reads into OTUs
    • otu: threads: Number of threads to use
    • otu: chimera_db: Path to UCHIME reference dataset for chimera detection (see; leave blank if using mock DNA amplified with chimera primers
    • blastn: ncbi_db: Path to a local ncbi nt database
    • brocc: taxonomy_db: Path to brocc nt database created with brocc's
  • Edit congfig.yml:
    • cluster: and default-resources: to match your slurm cluster setup
    • configfile: Absolute path to the project_config.yml above
  • To run the pipeline, simply bash run_snakemake.bash
  • You can use the files in the Rmd folder to create a basic bioinformatic report from the results (assuming some skill with Rstudio / R)

Notes on BROCC may be used to install a local taxonomy db for faster processing



Input: Multiplexed Illumina sequencing files
Output: manifest.csv, total_read_counts.tsv, demultiplexed fastq files


Input: demultiplexed fastq files OR output reads from Primer trimming Output: fastqc reports, concatenated fastqc_quality.tsv with all quality scores Note Fastqc is done at beginning and end of quality control to show differences

Primer trimming

Removes ITS forward and reverse primer sequences from reads
Output: reads/(reads.log, top_{rf}_seqs_trimmed.txt, {rf}_trimmed_removed_counts.txt, {rf}_trimmed.fastq)

OTU clustering

Create OTUs from amplicons using vsearch. Singletons are discarded for creating the OTUs, but used for the counts.
Rules are based on this wiki: (
Output: otu/otu_sorted.tsv


Determine the taxonomic assignments of the OTUs through a consensus based BLAST result (
Output: BLAST_BROCC_output/out_brocc/brocc.log

Optional but sometimes necessary is included to manually run the concatenation step for all the individual fastqc reports. This is because snakemake errors out when, for example, one of the samples has an empty report.


No description, website, or topics provided.






No releases published


No packages published

Contributors 3
