Skip to content

Latest commit

 

History

History
264 lines (206 loc) · 10.9 KB

README.md

File metadata and controls

264 lines (206 loc) · 10.9 KB

Benchmarking ChIP-Seq Peak Callers

Table of Contents

  1. PeakRanger
  2. MAC2
  3. SICER
  4. GEM
  5. MUSIC
  6. PePr
  7. DFilter

1. PeakRanger

Description:

PeakRanger is a multi-purporse software suite for analyzing next-generation sequencing (NGS) data. It contains the following tools:

  1. nr: a noise ratio estimator useful for QC statistics. Estimates signal to noise ratio which is an indicator for ChIP enrichment.
  2. lc: library complexity calculator useful for QC statistics. Calculates the ratio of unique reads over total reads. Only accepts bam files.
  3. ranger: ChIP-Seq peak caller. Ranger servers better as a narrow-peak caller. It behaves in a conservative but sensitive way compared to similar algorithms. It is able to identify enriched genomic regions while at the same time discover summits within these regions. Ranger supports HTML-based annotation reports.
  4. bcp: ChIP-Seq peak caller. Tuned for the discovery of broad peaks. BCP supports HTML-based annotation reports.
  5. ccat: ChIP-Seq peak caller. Tuned for the discovery of broad peaks. CCAT supports HTML-based annotation reports.

Peakranger is installed on Biowulf.

Loading PeakRanger on Biowulf:
module load peakranger
Running NR (Noise Ratio Estimator):
peakranger nr \
--format bam \
--data {expt1.bam} \
--control {control.bam} \
--output bcp_results
Running LC (Library Complexity Calculator):
peakranger lc \
--format bam \
{*.bam} \
--output bcp_results  
Running Ranger (Narrow Peak Caller):
peakranger ranger \
--format bam \
--report \
--plot_region 10000 \
--data {expt1.bam} \
--control {control.bam} \
--output bcp_results
-t 4
Running BCP (Broad Peak Caller):
peakranger bcp \
--format bam \
--report \
--plot_region 10000 \
--data {expt1.bam} \
--control {control.bam} \
--output bcp_results
-t 4
Running CCAT (Broad Peak Caller):
peakranger ccat \
--format bam \
--report \
--plot_region 10000 \
--data {expt1.bam} \
--control {control.bam} \
--output bcp_results
-t 4

2. MAC2

Description:

MACS empirically models the length of the sequenced ChIP fragments, which tends to be shorter than sonication or library construction size estimates, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome sequence, allowing for more sensitive and robust prediction. MACS compares favorably to existing ChIP-Seq peak-finding algorithms and can be used for ChIP-Seq with or without control samples.MAC2 is installed on Biowulf.

Loading MACs on Biowulf:
module load macs
Running MAC2 (Narrow Peak Mode):
module load macs/2.1.0.20150420 R
macs2 callpeak -t {input[0]} \
-c {input[1]} -f BAM -g {config[macs_g]} \
--outdir peaks/mac2/narrow -n {wildcards.sample} \
--nomodel --extsize {usePhantomPeaks.Rscript} -B -q 0.01 &> {log}
cd peaks/mac2/narrow && Rscript {wildcards.sample}_model.r
Running MAC2 (Broad Peak Mode):
module load macs/2.1.0.20150420
    macs2 callpeak -t {input[0]} \
    -c {input[1]} -f BAM -g {config[macs_g]} \
    --broad --broad-cutoff 0.1 --nomodel --extsize {usePhantomPeaks.Rscript} \
    --outdir peaks/mac2/broad -n {wildcards.sample} -q 0.001 &> {log}

3. SICER

Description:

Sicer is a clustering approach for identification of enriched domains from histone modification ChIP-Seq data.

Loading SICER on Biowulf:
module load sicer
Running SICER with controls (Narrow Peaks):
bash {params.SICERDIR}/SICER.sh ./ {wildcards.name}.bed {params.ctrl}.bed ./ hg18 1 100 {getfromPhantomPeaks} 0.79 200 0.01
Running SICER with controls (Broad Peaks):
bash {params.SICERDIR}/SICER.sh ./ {wildcards.name}.bed {params.ctrl}.bed ./ hg18 1 200 {getfromPhantomPeaks} 0.79 400 0.01

Example: $sh DIR/SICER.sh ["InputDir"] ["bed file"] ["control file"] ["OutputDir"] ["Species"] ["redundancy threshold"] ["window size (bp)"] ["fragment size"] ["effective genome fraction"] ["gap size (bp)"] ["FDR"]

Running SICER without controls (Narrow Peaks):
bash {params.SICERDIR}/SICER-rb.sh ./ {wildcards.name}.bed ./ hg18 1 100 {getfromPhantomPeaks} 0.79 200 100
Running SICER without controls (Broad Peaks):
bash {params.SICERDIR}/SICER-rb.sh ./ {wildcards.name}.bed ./ hg18 1 200 {getfromPhantomPeaks} 0.79 400 100

Example: $sh DIR/SICER-rb.sh ["InputDir"] ["bed file"] ["OutputDir"] ["species"] ["redundancy threshold"] ["window size (bp)"] ["fragment size"] ["effective genome fraction"] ["gap size (bp)"] ["E-value"]

Meanings of the parameters that are not self-explanatory:

  • Species: allowed species and genome versions are listed in GenomeData.py. You can add your own species and/or genome versions and relevant data there. Redundancy Threshold: The number of copies of identical reads allowed in a library.
  • Window size: resolution of SICER algorithm. For histone modifications, one can use 200 bp
  • Fragment size: is for determination of the amount of shift from the beginning of a read to the center of the DNA fragment represented by the read. FRAGMENT_SIZE=150 means the shift is 75.
  • Effective genome fraction: Effective Genome as fraction of the genome size.
  • Gap size: needs to be multiples of window size. Namely if the window size is 200, the gap size should be 0, 200, 400, 600, ….

4. GEM

Description:

GEM is a high-resolution peak calling and motif discovery tool for ChIP-seq and ChIP-exo data. GEM only supports BED and SAM alignment file formats. GEM is installed on Biowulf.

Loading GEM on Biowulf:
module load gem
Running GEM:
java -Xmx10g -jar $GEMJAR --t 24 \
--d ./Read_Distribution_default.txt \
--g ./mm10.chrom.sizes 
--genome /fdb/igenomes/Mus_musculus/UCSC/mm10/Sequence/Chromosomes/ \
--s 2000000000 
--expt SRX000540_mES_CTCF.bed \
--ctrl SRX000543_mES_GFP.bed \
--f BED \
--out mouseCTCF --k_min 6 --k_max 13
  • File for --d ./Read_Distribution_default.txt can be found here and files for --g ./*.chrom.sizes can be found here

5. MUSIC

Description:

MUSIC is a tool for identification of enriched regions at multiple scales in the read depth signals from ChIP-Seq experiments. MUSIC is installed on Biowulf.

Loading Music on Biowulf:
module load samtools  # needed to convert to sam format
module load music
Running Music:
mkdir chip; mkdir input
samtools view chip.bam | MUSIC -preprocess SAM stdin chip/ 
samtools view input.bam | MUSIC -preprocess SAM stdin input/
samtools view /directory/to/chip.bam | MUSIC -preprocess SAM stdin chip/ 
samtools view /directory/to/input.bam | MUSIC -preprocess SAM stdin input/
mkdir chip/sorted;mkdir chip/dedup;mkdir input/sorted;mkdir input/dedup
MUSIC -sort_reads chip chip/sorted 
MUSIC -sort_reads input input/sorted 
MUSIC -remove_duplicates chip/sorted 2 chip/dedup 
MUSIC -remove_duplicates input/sorted 2 input/dedup
  
MUSIC -get_multiscale_broad_ERs \
-chip chip/dedup \
-control input/dedup \
-mapp Mappability_36bp \
-l_mapp 36 \
-begin_l 1000 \
-end_l 16000 \
-step 1.5

6. PePr

Description:

PePr is a ChIP-Seq Peak-calling and Prioritization pipeline that uses a sliding window approach and models read counts across replicates and between groups with a negative binomial distribution. PePr empirically estimates the optimal shift/fragment size and sliding window width, and estimates dispersion from the local genomic area. Regions with less variability across replicates are ranked more favorably than regions with greater variability. Optional post-processing steps are also made available to filter out peaks not exhibiting the expected shift size and/or to narrow the width of peaks. PePr is installed on Biowulf.

Loading PePr on Biowulf:
module load PePr
Running Pepr:
PePr -c chip_rep1.bam,chip_rep2.bam \
-i input_rep1.bam,input_rep2.bam \
-f bam \
-n {expname}
  • --shiftsize Half the fragment size.. again comes for ppqt... this should be half ext size that we used for macs -f needs to be bampe for PE data...not to worry about this now --> this seems to get *

7. DFilter

Description:

DFilter has been made to detect regulatory regions and enriched sites using tag count data. It has been made using a generalized approach so that data from multiple kinds of assays can be analyzed. The raw tags files can be in 6-column bed file, bedgraph, bam or sam format. For more information, read through DFilter's documentation.

Location of DFilter:
/data/CCBR_Pipeliner/db/PipeDB/bin/DFilter1.6
Running DFilter (Narrow Peaks):
  run_dfilter.sh -d=ChIP.bed -c=input-control.bed -o=peaks.bed -ks=15 -lpval=6 -nonzero -refine -bs=50
Running DFilter (Broad Peaks):
  run_dfilter.sh -d=ChIP.bed -c=input-control.bed -o=peaks.bed -ks=25 -lpval=3 -nonzero -bs=100

Running DFilter (Open-chromatin ~ATAC-seq):

  run_dfilter.sh -d=Dnase-seq.bed -o=peaks.bed -ks=50 -lpval=2 -bs=100

Back to Top