1 - 50 of 62 results

GMATA / Genome-wide Microsatellite Analyzing Tool Package

star_border star_border star_border star_border star_border
star star star star star
(1)
forum (1)
Provides new strategies and complete solutions for fast simple sequence repeats (SSR) analyses, marker development, polymorphism screening by mapping and graphical display of results in a genome browser with other genic features. GMATA is the first tool that generates results that enable viewing SSR loci and SSR marker information along with other genome features in a genome browser. GMATA is also the first software package to generate multiple statistical graphics for SSRs, and the high-quality statistical graphics can be directly incorporated into articles for publication.

TRF / Tandem Repeats Finder

An algorithm for finding tandem repeats in DNA sequences without the need to specify either the pattern or pattern size. TRF uses the method of k-tuple matching to avoid the need for full scale alignment matrix computations. It requires no a priori knowledge of the pattern, pattern size or number of copies. There are no restrictions on the size of the repeats that can be detected. It uses percentage differences between adjacent copies and treats substitutions and indels separately. It determines a consensus pattern for the smallest repetitive unit in the tandem repeat.

SciRoKo / SSR Classification and Investigation by Robert Kofler

A user-friendly software tool for the identification of microsatellites in genomic sequences. The combination of an extremely fast search algorithm with a built-in summary statistic tool makes SciRoKo an excellent tool for full genome analysis. SciRoKo contains two main modules: a simple sequence repeat (SSR) search module, which supports five different SSR search modes and a module for SSR-statistics, notably for mismatch frequency and compound microsatellite analysis. Compared to other already existing tools, SciRoKo also allows the analysis of compound microsatellites.

tantan

Masks simple regions (low complexity and short-period tandem repeats) in DNA, RNA, and protein sequences. The aim of tantan is to prevent false predictions when searching for homologous regions between two sequences. Simple repeats often align strongly to each other, causing false homology predictions. Moreover, it enables accurate homology search for non-coding DNA with extreme A+T composition. This should be especially useful for large-scale, fully automated homology searches, such as comparisons of whole genomes or proteomes.

GECKO-CSB / GEnome Comparison with K-mers Out-of-core-Computational Synteny Block

Obsolete
A package which detects and identifies blocks of large rearrangements taking into account repeats, tandem repeats and duplications, starting with the simple collection of ungapped local alignments. GECKO-CSB formalizes linearity and collinearity properties in a computational synteny block (CSB) framework. These properties are useful not only to detect CSBs as it is shown in the results section but also to detect and identify Evolutionary Events. GECKO-CSB is the first method to approach the whole process as a coherent workflow -thus outperforming current state-of-the-art software tools- and additionally allowing to classify the type of rearrangement. GECKO-CBS is a part of GECKO software suite.

SA-SSR

An innovative algorithm based on suffix and longest common prefix arrays for efficiently detecting simple sequence repeats (SSRs) in large sets of sequences. Existing SSR detection applications are hampered by one or more limitations (i.e. speed, accuracy, ease-of-use, etc.). SA-SSR addresses these challenges while being the most comprehensive and correct SSR detection software available. SA-SSR is 100% accurate and detected >1000 more SSRs than the second best algorithm, while offering greater control to the user than any existing software.

ProGeRF / Proteome and Genome Repeat Finder Utilizing a Fast Parallel Hash Function

Obsolete
Extracts repetitive regions from genome and proteome sequences. ProGeRF was designed to be efficient, fast, and accurate and primarily user-friendly web tool allowing many ways to view and analyse the results. It provides graphical visualization and allows for the filtering of the results. Another advantage is the possibility of executing it on genomic and proteomic data and the ability to treat large genomic/proteomic data files.

IGRhCellID / Integrated Genomic Resources of Human Cell Lines for Identification

Collects genomic resources of common human cell lines. IGRhCellID provides short tandem repeat (STR) profiles of human cell lines and tools with conventional laboratory polymerase chain reaction (PCR) or DNA sequencing assays for routing examination of proper cell identification. The database contains integrated genomic information of more than 500 human cell lines annotated with eight different methods for cell identification. It allows researchers to find the available cell lines with designated genetic features and to identify common altered loci and genes overlapped in multiple cell lines.

RepeatAnalyzer

A software tool for storing, managing, identifying and analysing short-sequence repeats for the purpose of strain identification. RepeatAnalyzer can take a gene sequence and return the repeats it contains along with the known strain (if any) that the sequence belongs to. It does so by storing data distilled from sources on repeats at a given short-sequence repeat (SSR) locus. The data can be updated simply and searched easily for information about any known strains or repeats. All of these tasks are done in a computationally efficient manner using the Knuth-Morris-Pratt (KMP) string matching algorithm and general programming best practices. RepeatAnalyzer can also produce a map for any combination of repeats and strains in a given region, offering geographic insights into their distribution not previously available. In addition, it can calculate metrics of diversity within geographic regions.

RepLong

Allows users to determine de novo repeat elements. RepLong is a standalone software which can be used for analyzing lower coverage data or complement existing methods to improve the repeat identification of long-read sequencing data. First, the program builds a network of read overlaps, then it utilizes network modularity optimization for isolating the communities with a better intra connectivity than inter connectivity and finally, obtains the representative reads for each of them.

Lirex / Long Inverted Repeats Explorer

Allows identification of long inverted repeats (LIRs) in a long genomic distance. Lirex is a cross-platform tool which allows users to specify LIR searching criteria, such as length of the region, as well as pattern and size of the repeats. The various secondary structures formed by the LIRs in transcripts may be then predicted based on the distribution of LIRs in a given gene. The software may assist in designing following experiments to explore the function of LIRs.

TAREAN / TAndem REpeat Analyzer

Detects satellite repeats directly from unassembled short reads. TAREAN employs graph-based sequence clustering to identify groups of reads that represent repetitive elements. It takes paired-end Next Generation Sequencing (NGS) reads as input and outputs a list of clusters identified as putative satellite repeats, their genomic abundance and various cluster characteristics. The tool performance was successfully validated by analyzing low-pass genome sequencing data from five plant species where satellite DNA was previously experimentally characterized.

TRAP / Tandem Repeats Analysis Program

A Perl program that provides a unified set of analyses for the selection, classification, quantification and automated annotation of tandemly repeated sequences. TRAP uses the results of the Tandem Repeats Finder program to perform a global analysis of the satellite content of DNA sequences, permitting researchers to easily assess the tandem repeat content for both individual sequences and whole genomes. The results can be generated in convenient formats such as HTML and comma-separated values. TRAP can also be used to automatically generate annotation data in the format of feature table and GFF files.

STAR

Searches for significant approximate tandem repeats (ATR) of a given motif in a DNA sequence. For each region of the sequence that is similar to a tandem repeat of the input motif, STAR returns a description of the segment that it calls a "Zone", and an optimal alignment of the zone with the best possible exact tandem repeat (here, exact means perfect). STAR optimally detects all such zones in the sequence. Significance is assessed by a measure of local compressibility of the segment.

Red

star_border star_border star_border star_border star_border
star star star star star
(2)
A repeat-detection tool capable of labeling its training data and training itself automatically on an entire genome. Red is easy to install and use. It is sensitive to both transposons and simple repeats; in contrast, available tools such as RepeatScout and ReCon are sensitive to transposons, and WindowMasker to simple repeats. Red performed consistently well on seven genomes; the other tools performed well only on some genomes. Red is much faster than RepeatScout and ReCon and has a much lower false positive rate than WindowMasker.

IRF / Inverted Repeats Finder

A prototype tool for identifying approximate inverted repeats in nucleotide sequences that is similar in concept to the Tandem Repeats Finder. Candidate IRs are detected by finding short, exact, reverse-complement matches of 4-7 nt (k-tuples) between nonoverlapping fragments of a sequence. A “center” position is defined for each k-tuple match. Short k-tuples are used to detect short IRs with short spacers, and longer k-tuples are used to detect longer IRs with potentially larger spacers, typically 10-100 kb. IRF detects “clusters” of k-tuple matches having the same or nearly the same center and falling within a small interval of sequence. Several interval sizes are prespecified, typically between 30 and 2000 nt long.

MsDetector

Obsolete
A software tool for DNA microsatellites (MSs) detection. The system is based on a hidden Markov model and a general linear model. The user is not obligated to optimize the parameters of MsDetector. Neither a list of motifs nor a library of known MSs is required. MsDetector is both memory- and time-efficient. The memory requirement and the run time are linear with respect to the length of the input sequence. We applied MsDetector to several species. MsDetector located the majority of MSs found by other widely used tools. In addition, MsDetector identified novel MSs. Furthermore, the system has a very low false-positive rate resulting in a precision of up to 99%. MsDetector is expected to produce consistent results across studies analyzing the same sequence.

Dot2dot

Allows to scale genome-wide analysis. Dot2dot is an algorithm for tandem repeat identification in a target genome. This model of repeat is general enough to capture various classes of tandem repeats with different characteristics: pathology-linked, forensic, for population analysis, genealogic-oriented, and repeats in the regulatory regions. It also permits a compact representation of the dot-plot matrices that: (i) allows to scale at genome-wide analysis, and (ii) can find application to other problems where dot-plots are used.

SBARS / Spectral-Based Approach for Repeats Search

A fast and efficient tool for identifying dispersed (direct, inverted) and tandem DNA repeats. The program is not aimed at the comparison of individual nucleotides. The main idea of this approach is to quickly identify the similarity of individual fragments within the query sequences, disregarding single nucleotide insertions or deletions. The current version of SBARS efficiently identifies repeated sequences and can be developed for the analysis of long insertions or deletions because of chromosome rearrangements in similar sequences. The program is an advanced real-time viewer for DNA sequences at different scales.

LEPSCAN / LatEnt Periodicity SCANner

A web server for searching latent periodicity based on the method of modified profile analysis (MPA). LEPSCAN allows searching latent periodicity in presence of insertions and deletions. Period length belongs to the range 2–20 nt, not including the triplet periodicity. The results obtained are subjected to various filtration steps to ensure their statistical significance. This web site can be a useful tool supplementing the existing web sites for searching tandem repeats.

detectIR

A MATLAB-based program for the perfect and imperfect inverted repeat detection that utilizes complex numbers and vector calculation and allows genome-scale data inputs. detectIR uses an algorithm to convert the conventional sequence string comparison in inverted repeat detection into vector calculation of complex numbers, allowing non-complementary pairs (mismatches) in the pairing stem and a non-palindromic spacer (loop or gaps) in the middle of inverted repeats. Compared with existing popular tools, this program performs with significantly higher accuracy and efficiency. Using genome sequence data from HIV-1, Arabidopsis thaliana, Homo sapiens and Zea mays for comparison, detectIR can find lots of inverted repeats missed by existing tools whose outputs often contain many invalid cases.

TROLL / Tandem Repeat Occurrence Locator

A light-weight simple sequence repeat (SSR) finder based on a slight modification of the Aho-Corasick algorithm. TROLL is designed to have a powerful yet simple interface. The operation is performed through command line interaction. The input arguments let the user indicate the SSRs’ minimum length desired, the maximum motif length and the files containing the motif list and the DNA sequence. TROLL is fast and only requires a standard personal computer (PC) to operate.

WM / WindowMasker

Identifies and masks highly repetitive DNA sequences in a genome, using only the sequence of the genome itself. WM is orders of magnitude faster than RM because WM uses a few linear-time scans of the genome sequence, rather than local alignment methods that compare each library sequence with each piece of the genome. WindowMasker has two modules for masking DNA sequences. The WinMask module is used to mask potentially repetitive sequences by counting the number of times different n-mers (units) occur in the genome. The DUST module is used to identify and mask low-complexity regions.

MREPATT / Multiple consecutive REpeated PATTerns in DNA sequences

A program to determine the number, length and position of exact consecutive repeats of short sequences in DNA fragments or whole genomes. The program also gives the statistical significance of results by comparing them with those expected for a random sequence generated according to a Markovian model. The program works as follows: once some patterns are defined, the program searches for consecutive repetitions of each pattern within the given query sequences.

OMWSA / Optimized Moving Window Spectral Analysis

A method and a visualization tool for DNA repeat detection. The spectrogram obtained by OMWSA can clearly display the general distribution of repetitive sequences. More importantly, the repeats mutating excessively and interleaving each other can be detected with high resolution due to the distinguished ability of parametric spectral analysis in our algorithm. Compared with the traditional Fourier transform (FT)-based spectral analysis, our method produces less artifacts and more reliable results in both visual and numerical analysis. OMWSA also compares with existing software favorably for both short and long repeat detection.

PRAP / Prokaryotic Repeats Annotation Program

Automates the analysis of repeats in both finished and draft genomes. It is aimed at identifying full spectrum repeats at the scale of the prokaryotic genome. Compared with the major existing repeat finding tools, PRAP exhibits competitive or better results. The results are consistent with manually curated and experimental data. Repeats can be identified and grouped into families to define their relevant types. The final output is parsed into the European Molecular Biology Laboratory (EMBL)/GenBank feature table format for reading and displaying in Artemis, where it can be combined or compared with other genome data. It is currently the most complete repeat finder for prokaryotes and is a valuable tool for genome annotation.

XSTREAM

A powerful genome data-mining tool designed to efficiently identify tandem repeat (TR) patterns in biological sequence data. XSTREAM uses a seed-extension strategy coupled with several post-processing algorithms to analyze FASTA-formatted protein or nucleotide sequences. It uses a number of user-defined parameters to identify non-redundant TR sequences with diverse periods and domain sizes, and varied levels of degeneracy. Additionally, XSTREAM effectively merges discontinuous TRs into larger TR domains, clusters similar TR sequences, models TR domain architectures, and detects hierarchical TR patterns.