Automated TopHat-cuffdiff pipeline for illumina paired-end data. This is a basic script written as a part of another pipeline, it will be updated with best practice methods in pyhton. For e.g python os.sys()
is no longer recommended.
This script performs:
- Alignment to the genome with tophat2
- Then Removes the reads with mapping quality less than 50 ( unique mapping )
- Sort and index the bam files.
- Run cuffdiff based on experimental grouping.
This script takes:
- Path to set of input fastq files ( Illumina paired-end files with illumina naming convention).
- Path to genome fasta file.
- Path and base name of Bowtie2 index files
- Path to GTF file
- Number of processors
- Experimental grouping file
This script assumes:
- cufflinks package v2 installed and added to Path
- samtools is installed and added to path
Experimental grouping file should have the R1 files and the group number seperated by tab...an example file is given below:
c1r1_R1_001.fastq.gz 1
c1r2_R1_001.fastq.gz 1
c1r3_R1_001.fastq.gz 1
c2r1_R1_001.fastq.gz 2
c2r2_R1_001.fastq.gz 2
c2r3_R1_001.fastq.gz 2
Usage:
1.Open the script in any text editor and give full path to all the reguired files/folders. 2. Then run the script using the following command:
python RNA-SEQ.py
To do:
Flexibility to give extra parameters to TopHat2 and cuffdiff