Genomic region overlap analysis software tools | DNA annotation
A common question in genomic analysis is whether two sets of genomic intervals overlap significantly. This question arises, for example, when interpreting ChIP-Seq or RNA-Seq data in functional terms. Because genome organization is complex, answering this question is non-trivial.
A tool for estimating the significance of overlap between multiple sets of genomic intervals. GAT implements a null model that the two sets of intervals are placed independently of one another, but allows each set's density to depend on external variables, for example, isochore structure or chromosome identity. GAT estimates statistical significance based on simulation and controls for multiple tests using the false discovery rate.
An R package that implements a permutation test framework specifically designed to work with genomic regions. In addition to the predefined randomization and evaluation strategies, regioneR is fully customizable allowing the use of custom strategies to adapt it to specific questions. Finally, it also implements a novel function to evaluate the local specificity of the detected association. regioneR is a customizable tool to manage and analyze sets of regions, and a useful addition to the NGS and genome wide analysis toolbox.
Estimates genome-wide positional correlations. StereoGene is able to rapidly compute correlations of continuous genome-wide features in addition to discrete gene intervals used in most correlation techniques. It is applicable for analysis of similarity of genomic-track-represented biological data, including massive analysis. The tool provides a collection of genome-wide correlation techniques.
Allows users to distinguish and investigate multiple overlaps of genomic regions. MULTOVL is a suite of tools that contains: (i) MULTOVL for estimating overlap null distributions; (ii) MULTOVLPROB for distributing shuffled genomic regions into the free regions, (iii) PGMULTOVL to handle tracks from PostgreSQL relational databases, and (iv) PARMULTOVLPROB for executing the shuffling of regions on independent copies of the input data.
Allows integrative context-dependent analyses of diverse local and remotely hosted datasets, as well as annotation and spatial querying. CruzDB is a parallelizable programmatic interface with University of California, Santa Cruz (UCSC) genome browser that offers a syntax to address common use-cases including annotation and spatial querying. The software can be used for any organism and version available in the UCSC database.
Allows users to identify the sequences reported across multiple sets of output files. Intersect identifies and visualizes sequence links among sets of homologous sequences, including those representing very divergent relationships. This method provides congruence analysis functionality via a graphical user interface. It is also useful for examining new hypotheses about relationships among two or more families/super-families.
A method for identifying whether two sets of intervals are spatially correlated across a genome, detected as a deviation from a nonuniform distribution of one set of intervals with respect to the other. GenometriCorr performs all analyses on each input, so that a variety of biologically significant relationships are queried. This includes looking for proximity, looking for uniform spacing, looking for increased or decreased overlaps of intervals or points, and presenting the data in a way that a biologist can understand.
Allows the interrogation of significant associations between many genomic features simultaneously. GINOM treats query interval location as a random variable of log-linear distribution with model terms formed from any possible combination, or interaction, of multiple reference sets. It can uncover any higher-order interaction among reference sets that has a significant effect on query interval location. It provides an effective method to screen for yet-uncharacterized higher-order associations between genomic features.
Allows to explore and uncover patterns of epigenomic variability and covariability. KLTepigenome uses a functional data analysis approach based on the Karhunen-Loève transform. It can outperform ordinary correlation measurements to uncover correlations between histone marks that are similarly regulated but which do not directly co-localize. The tool can be useful for large scale exploratory analyses of variation aiming to study genomic and epigenomic maps.
Enables flexible genomic interval analysis. valr is a program that permits manipulation of data in R. This tool provides a set of functions to read BED, BEDgraph, and VCF formats into R. It also includes features to support connections to remote databases to access the UCSC and Ensembl databases. Moreover, it can be used in RMarkdown documents to provide documentation of exploratory data analyses and generate reproducible work-flows for data processing.
Estimates the directory overlap between two parasites based on the spanning of their subsampled directories. BRO provides a statistical model for pairwise type sharing (PTS) using Bayesian inference. It produces unbiased estimations of the true overlap of the repertory. This method can be used in laboratory programming by analyzing the trade-off between sequencing effort and measurement error.