Peak calling software tools | CLIP sequencing data analysis
Assessing peaks is a central step of the analysis to determine specific signal over the noise background for the identification of real binding sites. The number of identified peaks increases with the sequencing depth because weaker sites become statistically significant with a greater number of reads. However, the optimal sequencing depth can only be experimentally evaluated, as it depends on the noise background of the antibody.
Detects functional signals in tag profiles from different assays such as histone ChIP-seq, TF ChiP-seq, DNase-seq and FAIRE-seq. DFilter is based on a single receiver operating characteristic – area under the curve (ROC-AUC) optimizing algorithm. This software suits for genomic signals of individual cell types diluted in cellular mixture because the proportion of marginal signals can be mistaken for noise by suboptimal algorithms.
Defines peaks in CLIP-seq dataset. CLIPper combines features from many CLIP peak-finding algorithms. To reduce false-positives, it employes a three-pass filter on our peaks. For each gene it calculates the false-discovery rate threshold (FDR), which is the "height" of reads mapped at a single genomic position that is likely to be noise, determined by randomly scattering the same number of faux reads as real reads across a faux transcript that is the same effective length as the real transcript.
Permits to control systematic mis-assignments in individual-nucleotide resolution crosslinking and immunoprecipitation (iCLIP) data. iCLIPro first visualizes coinciding and noncoinciding fragment start sites in an aligned BAM file and then identifies the best way to analyze iCLIP data by generating overlap heatmaps. The tool is able to recognize regions for a read overlap test.
Allows to precisely capture protein-RNA interaction footprints from iCLIP/eCLIP-seq data. PureCLIP provides a promising method to analyse datasets, also for proteins with lower binding affinities or proteins binding to low abundant RNAs, such as lncRNAs. It is able to incorporate RNA abundances and non-specific sequence biases. This method uses a nonhomogeneous Hidden Markov model (HMM) to incorporate additional factors into the model.
Identifies target sites via an unsupervised segmentation of the transcriptome. omniCLIP is a Bayesian method to detects regulatory regions from all of the aforementioned protocols. This method learns the relevant diagnostic events directly from the data and automatically uses it during peak calling. Furthermore, it explicitly accounts for confounding factors as well as technical and biological variance.
A statistical and computational framework for PAR-CLIP data analysis. A sensitive transition-centered algorithm specifically designed to resolve protein binding sites at high resolution in PAR-CLIP data was developed. This method employes a Bayesian network approach to associate posterior log-odds with the observed transitions, providing an overall quantification of the confidence in RNA-protein interaction.
Performs the analysis of RNAs isolated by crosslinking immunoprecipitation (HITS-CLIP). CLIPick provides a pipeline able to consider expression-dependent background simulation to assess peak signals significance. This program includes three steps: (i) refining CLIP clusters; (ii) picking of peaks with significant height; (iii) and lastly, resolving RNA-binding proteins (RBP) footprints regions as peak widths. It can be used to investigate RBP regulations.
A peak-calling tool based on the zero-truncated negative binomial regression model that is able to incorporate external information to guide the site identification process. Piranha can also optionally use additional external covariates into the peak-calling process, and identify sites of differential binding occupancy between cell types, conditions or development stages. Transcript abundance influences the read counts at sites in IP datasets and Piranha can successfully incorporate RNA-seq control data to ameliorate this bias. By considering this additional information, more accurate peak calls are arrived at.
A method and software package for including both gene- and TE-associated ambiguously mapped reads in differential expression analysis. TEtranscripts shows improved recovery of TE transcripts over other published expression analysis methods, in both synthetic data and qPCR/NanoString-validated published datasets. TEpeaks can be used for analysis of HITS-CLIP data and enhances the method implemented by MACS software by identifying ‘narrow’ peaks.
Enables robust peak detection even in low abundance transcripts. ASPeak is a fast and efficient expression-sensitive peak caller for CLIP- and RIP-Seq data that is sensitive to differential expression levels of target transcripts. Implementation of this method permits to run on multiple processors resulting in a significant speedup when used on high performance computing centers. It is supported with extensive documentation that allows experienced bioinformaticians to customize their analyses using detailed parameter files.
A model-based approach to identify high-confidence protein-RNA binding sites from CLIP-seq datasets. This approach assigns a probability score for each potential binding site to help prioritize subsequent validation experiments. The MiClip algorithm has been tested in both HITS-CLIP and PAR-CLIP datasets.
Identifies RNA motifs at full structural resolution while also being able to simultaneously deconvolve multiple motifs. SARNAclust is a clustering algorithm for CLIP peaks that typically uses the secondary structure prediction from the tool RNApeakFold. The software makes use of a graph kernel to evaluate similarity between sequence/structure objects, and provides the ability to isolate the impact of specific features through a bulge graph formalism.
A toolkit for processing and aligning short and error-prone sequencing reads. PARA-suite is implemented in Java using HTSjdk, a Java API for high-throughput sequencing data formats. The PARA-suite allows the user to estimate a sequencing run-specific error profile, combine the results of multiple reference database alignments, cluster an aligned sequencing read dataset (‘PAR-CLIP read simulator and hierarchical clustering’), run the PAR-CLIP read simulator, benchmark an alignment of simulated PAR-CLIP sequencing reads and run a full processing pipeline for error-prone short read alignments.
A model-based approach to detect RNA-RBP binding sites in PAR-CLIP. PAR-CLIP HMM integrates models to identify enriched regions and high-confidence binding sites into one rigorous statistical model. An advantage of our integrative modeling is that the posterior probability of being a binding site is estimated based on data with less information loss, as compared with two-stage modeling approaches. This facilitates more accurate statistical inference, so our method would provide more reliable binding sites based on the false discovery rate.
Detects peaks in HITS-CLIP data and generates base-specific coverage metrics. CLIP-PyL needs aligned BAM files as input files along with a BED file containing the gene or transcript in which the user is interested. The program creates pile-ups of mapped reads and can generate a PDF file with coverage plots or BedGraph files that can be uploaded to any genome browsers for visualization.
An implementation of the modified false discovery rate algorithm proposed in (Yeo et al., 2009) to determine which clusters are significant in a list of genomic regions (like genes or transcripts). Pyicoclip is typically used in CLIP-Seq data and doesn’t need a control experiment to compare against. This method could in principle be used for any other kind of short-read data that doesn’t have a valid control.
Detects RNA-protein binding sites in HITS-CLIP. HITS-CLIP Analysis is a model-based approach that provides essential MATLAB functions to implement a model for the identification of binding sites using heterogeneous logit models via semi-supervised learning. This model is established on all the sequencing reads (including non-clustered read sequences) to investigate binding sites at single base pair resolution.
Serves for transcriptomic peak re-mapping. TxCoords can help users to retrieve the true sequences and to correct the incorrectly reported transcriptomic peaks. It also allows to visualize the re-mapped peaks from the UCSC Genome Browser or in a schematic figure. This tool can be used in two ways, a web application and a local application.