Read processing and filtering, de novo transcriptome assembly (Trinity), differential gene expression analysis and functional annotation of Orciraptor agilis RNA-seq.
- Run readprocessing_and_assembly.sh, output is de novo transcriptome assembly of Mougeotia sp.
- Predict ORFs to use later for decontamination: transdecoder.sh
- Filter transcriptome for contigs smaller than 200 nt: seqkit_length.sh (only for upload)
- Run symlinks.sh
- Run readprocessing.sh. Output are quality filtered and adapter trimmed reads.
- Run mapping.sh. Output are reads that do not map to sequences from rRNA and/or Mougeotia sp.
- Run assembly.sh to assemble the transcriptome from processed reads of all libraries. Output is de novo transcriptome assembly of Orciraptor agilis as a fasta.
- Run blastn search with this transcriptome (nt database v5 updated on 2021-03-10): blastn.sh
- Checked contigs with > 95% identity over a length of minimum 100 nt, saved contig identifiers of all bacterial, viral, ribosomal and algal contigs in contaminants.txt
- Remove these sequences from transcriptome with seqkit.sh
- ORF prediction with transdecoder.sh
- Use Change_Seqname_TransDecoder.py to change names of the transcdecoder output
- Perform a diamond blastp search with diamond.sh comparing all ORFs against each other
- Run ParseORFsVSORFsblastp.py on the diamond output and the renamed transcdecoder file to obtain a non-redundant .pep file, rename output to "Orciraptor_non-redundant.faa"
- Use Change_Seqname_TransDecoder.py to change the names of the corresponding coding sequences
- Call Extract_seq_using_FASTA.py to obtain the coding sequences of the non-redundant .pep file
- Run eggnog-mapper in diamond and hmm mode: eggnog.sh.
- Run InterProScan using interproscan.sh
- Run a Diamond blastp search vs Swiss-Prot database (UniProt Release 2021_01): diamond.sh
- Annotation of carbohydrate-active enzymes (CAZymes) with with dbcan2 in HMM mode (database dbCAN-HMMdb-V9): cazy.sh
- Mapping the processed reads back to the newly generated transcriptome with bowtie2 and counting with salmon in alignment-mode (bowtie2.sh).
- Run DESeq2 analysis and generate figures (DESeq2/DESeq2.R)
- Number of number + length statistics of contigs was calculated with TrinityStats.pl script from Trinity toolkit
- Number, completeness and orientation of ORFs is summarised with transdecoder_count.sh
- ExN50 statistic is calculated with ExN50.sh
- Run AssignRandomSeqnames.py on input sequences to generate random names
- Simplify tip labels and generate annotation file for FigTree with Renaming_and_annotation.R
- Run alignment and trimming: align_and_trim.sh
- Find best model with IQtree_modelfind.sh
- Generate tree with IQtree.sh
- Run RenameTrees.py to get tip labels