Tandem repeat detection software tools | Genome annotation
Tandem repeats (TRs) represent one of the most prevalent features of genomic sequences. Due to their abundance and functional significance, a plethora of detection tools has been devised over the last two decades.
Gives access to many free software tools for sequence analysis. EMBOSS aims to serve the molecular biology community. It permits the creation and the release of software in an open source spirit. This tool is useful for sequence analysis into a seamless whole. It is free of charge and is available in open source.
Offers a platform dedicated to DNA next-generation sequencing (NGS) data analysis, annotation and visualization. DNAscan can be set for running on various mode to adapt its performance to focus on a specific subregion or material. The application is able to detect a wide range of genetic material including single nucleotides variants (SNVs), repeat expansions and structural variants (SVs). The application can be run through Docker and Singularity.
Provides new strategies and complete solutions for fast simple sequence repeats (SSR) analyses, marker development, polymorphism screening by mapping and graphical display of results in a genome browser with other genic features. GMATA is the first tool that generates results that enable viewing SSR loci and SSR marker information along with other genome features in a genome browser. GMATA is also the first software package to generate multiple statistical graphics for SSRs, and the high-quality statistical graphics can be directly incorporated into articles for publication.
Detects several types of repeats and provides an evaluation of significance as well as interactive visualization. REPuter was designed to be used in studies of repetitive DNA on a genomic or inter-genomic scale. The software enables the comparison of two or more sequences by concatenating them, and then searching for repeats. It utilizes the search engine REPfind which uses an implementation of suffix trees to locate exact repeats in linear space and time.
A repeat-detection tool capable of labeling its training data and training itself automatically on an entire genome. Red is easy to install and use. It is sensitive to both transposons and simple repeats; in contrast, available tools such as RepeatScout and ReCon are sensitive to transposons, and WindowMasker to simple repeats. Red performed consistently well on seven genomes; the other tools performed well only on some genomes. Red is much faster than RepeatScout and ReCon and has a much lower false positive rate than WindowMasker.
Detects tandem repeats in DNA sequences without needing pattern or pattern size. TRF exploits a probabilistic model of tandem repeats and several statistical criteria based on that model. It uses also an algorithm of k-tuple matching to avoid full scale alignment matrix computations. This software has detection and analysis components: the detection part finds candidate tandem repeats and the analysis part deals with generating an alignment for each candidate with statistics and nucleotide sequence.
A prototype tool for identifying approximate inverted repeats in nucleotide sequences that is similar in concept to the Tandem Repeats Finder. Candidate IRs are detected by finding short, exact, reverse-complement matches of 4-7 nt (k-tuples) between nonoverlapping fragments of a sequence. A “center” position is defined for each k-tuple match. Short k-tuples are used to detect short IRs with short spacers, and longer k-tuples are used to detect longer IRs with potentially larger spacers, typically 10-100 kb. IRF detects “clusters” of k-tuple matches having the same or nearly the same center and falling within a small interval of sequence. Several interval sizes are prespecified, typically between 30 and 2000 nt long.
Determines pathogenic repeat expansions from paired end, Polymerase Chain Reaction (PCR)-free and whole-genome sequencing (WGS) data. ExpansionHunter computes the maximum-likelihood genotype consisting of candidate repeat alleles determined by spanning, flanking, and in-repeat reads. This method can also be applied for detecting new pathogenic repeat expansions for both short and long repeats.
Allows users to model each variable number tandem repeats (VNTR), count repeat units, and detect sequence variation. adVNTR reports for any target VNTR in a donor an estimate of repeat unit (RU) counts and points mutations within the RUs. It trains Hidden Markov Models (HMMs) for each target VNTR locus, which provide the following advantages: (1) matching any portions of the unique flanking regions for read alignment; (2) separating homopolymer runs from other indels helping with frameshift detection; and (3) each VNTR can be modeled individually.
Evaluates rare short tandem repeat (STR) expansions from whole genome sequencing (WGS) data. STRetch highlights rare expansions at every STR locus in a given genome and assesses their approximate size directly from short-read sequencing. The application can detect both expansions at loci not previously linked to diseases as well as known pathogenic STR expansions in short-read polymerase chain reaction (PCR)-free WGS data.
Allows identification of tandemly repeated structures in DNA sequences. mreps is constituted by an exhaustive combinatorial algorithms used to find all repeats verifying certain mathematical properties. The software can be used for locating a particular type of tandem repeat, or to make a fast genome-wide analysis of tandemly repeated patterns. It is also able to identify loose repeats through a special resolution parameter.
Masks simple regions (low complexity and short-period tandem repeats) in DNA, RNA, and protein sequences. The aim of tantan is to prevent false predictions when searching for homologous regions between two sequences. Simple repeats often align strongly to each other, causing false homology predictions. Moreover, it enables accurate homology search for non-coding DNA with extreme A+T composition. This should be especially useful for large-scale, fully automated homology searches, such as comparisons of whole genomes or proteomes.
Detects satellite repeats directly from unassembled short reads. TAREAN employs graph-based sequence clustering to identify groups of reads that represent repetitive elements. It takes paired-end Next Generation Sequencing (NGS) reads as input and outputs a list of clusters identified as putative satellite repeats, their genomic abundance and various cluster characteristics. The tool performance was successfully validated by analyzing low-pass genome sequencing data from five plant species where satellite DNA was previously experimentally characterized.
A software tool for DNA microsatellites (MSs) detection. The system is based on a hidden Markov model and a general linear model. The user is not obligated to optimize the parameters of MsDetector. Neither a list of motifs nor a library of known MSs is required. MsDetector is both memory- and time-efficient. The memory requirement and the run time are linear with respect to the length of the input sequence. We applied MsDetector to several species. MsDetector located the majority of MSs found by other widely used tools. In addition, MsDetector identified novel MSs. Furthermore, the system has a very low false-positive rate resulting in a precision of up to 99%. MsDetector is expected to produce consistent results across studies analyzing the same sequence.
Identifies complex repetitive structures in DNA called nested tandem repeats (NTR). NTRFinder is based on an algorithm that detect the recurrence of two or more apparent tandem motifs interspersed with each other. This tool has been tested on real and simulated data. The nested tandem repeats found with this application can be used as population and phylogenetic markers.
Detects predominant periodicities in a nucleotide sequence. PerPlot counts the number of times a pair of A-tracts occur in the analyzed sequence at a mutual distance. The program reads a nucleotide sequence in the standard GenBank or Fasta format. The tool outputs two indices that characterize the periodicity of the analyzed sequence: the height of the dominant peak and the period corresponding to the dominant peak.
An approach to de novo repeat annotation that exploits characteristic patterns of local alignments induced by certain classes of repeats. PILER is a package of efficient search algorithms for identifying such patterns. Novel repeats found using PILER are reported for Homo sapiens, Arabidopsis thalania and Drosophila melanogaster.
A user-friendly software tool for the identification of microsatellites in genomic sequences. The combination of an extremely fast search algorithm with a built-in summary statistic tool makes SciRoKo an excellent tool for full genome analysis. SciRoKo contains two main modules: a simple sequence repeat (SSR) search module, which supports five different SSR search modes and a module for SSR-statistics, notably for mismatch frequency and compound microsatellite analysis. Compared to other already existing tools, SciRoKo also allows the analysis of compound microsatellites.
Searches for significant approximate tandem repeats (ATR) of a given motif in a DNA sequence. For each region of the sequence that is similar to a tandem repeat of the input motif, STAR returns a description of the segment that it calls a "Zone", and an optimal alignment of the zone with the best possible exact tandem repeat (here, exact means perfect). STAR optimally detects all such zones in the sequence. Significance is assessed by a measure of local compressibility of the segment.