Peak calling software tools | CLIP sequencing data analysis
Assessing peaks is a central step of the analysis to determine specific signal over the noise background for the identification of real binding sites. The number of identified peaks increases with the sequencing depth because weaker sites become statistically significant with a greater number of reads. However, the optimal sequencing depth can only be experimentally evaluated, as it depends on the noise background of the antibody.
A model-based approach to identify high-confidence protein-RNA binding sites from CLIP-seq datasets. This approach assigns a probability score for each potential binding site to help prioritize subsequent validation experiments. The MiClip algorithm has been tested in both HITS-CLIP and PAR-CLIP datasets.
A peak-calling tool based on the zero-truncated negative binomial regression model that is able to incorporate external information to guide the site identification process. Piranha can also optionally use additional external covariates into the peak-calling process, and identify sites of differential binding occupancy between cell types, conditions or development stages. Transcript abundance influences the read counts at sites in IP datasets and Piranha can successfully incorporate RNA-seq control data to ameliorate this bias. By considering this additional information, more accurate peak calls are arrived at.
Detects functional signals in tag profiles from different assays such as histone ChIP-seq, TF ChiP-seq, DNase-seq and FAIRE-seq. DFilter is based on a single receiver operating characteristic – area under the curve (ROC-AUC) optimizing algorithm. This software suits for genomic signals of individual cell types diluted in cellular mixture because the proportion of marginal signals can be mistaken for noise by suboptimal algorithms.
Defines peaks in CLIP-seq dataset. CLIPper combines features from many CLIP peak-finding algorithms. To reduce false-positives, it employes a three-pass filter on our peaks. For each gene it calculates the false-discovery rate threshold (FDR), which is the "height" of reads mapped at a single genomic position that is likely to be noise, determined by randomly scattering the same number of faux reads as real reads across a faux transcript that is the same effective length as the real transcript.
A method and software package for including both gene- and TE-associated ambiguously mapped reads in differential expression analysis. TEtranscripts shows improved recovery of TE transcripts over other published expression analysis methods, in both synthetic data and qPCR/NanoString-validated published datasets. TEpeaks can be used for analysis of HITS-CLIP data and enhances the method implemented by MACS software by identifying ‘narrow’ peaks.