Amino acid repeat prediction software tools | Protein sequence data analysis
An estimated 25% of all eukaryotic proteins contain repeats, which underlines the importance of duplication for evolving new protein functions. Internal repeats often correspond to structural or functional units in proteins. Methods capable of identifying diverged repeated segments or domains at the sequence level can therefore assist in predicting domain structures, inferring hypotheses about function and mechanism, and investigating the evolution of proteins from smaller fragments.
Detects and analyzes regions with short gapless repeats in protein sequences or alignments. REPPER is a web server that implements programs using a sliding window, so as to show the boundaries of periodic regions and allow the detection of multiple regions with different periodicities in the same protein. User can take a multiple sequence alignment as input, and also calculate a profile for a given single input sequence using PSI-BLAST with two iterations and an E-value cutoff of 0.001.
A profile-based method which uses a P-value-dependent score offset to include divergent repeat units and which exploits the tendency of repeats to occur in tandem. TPRpred detects not only tetratrico peptide repeat (TPR)-like repeats, but also the related pentatrico peptide repeats (PPRs) and SEL1-like repeats. The corresponding profiles were generated through iterative searches, by varying the threshold parameters for inclusion of repeat units into the profiles, and the best profiles were selected based on their performance on proteins of known structure. TPRpred performs significantly better in detecting divergent repeats in TPR-containing proteins, and finds more individual repeats than the existing methods.
A powerful genome data-mining tool designed to efficiently identify tandem repeat (TR) patterns in biological sequence data. XSTREAM uses a seed-extension strategy coupled with several post-processing algorithms to analyze FASTA-formatted protein or nucleotide sequences. It uses a number of user-defined parameters to identify non-redundant TR sequences with diverse periods and domain sizes, and varied levels of degeneracy. Additionally, XSTREAM effectively merges discontinuous TRs into larger TR domains, clusters similar TR sequences, models TR domain architectures, and detects hierarchical TR patterns.
A method for the de novo identification of repeats in protein sequences. It is able to detect the sequence signature of structural repeats in many proteins that have not yet been known to possess internal sequence symmetry, such as outer membrane beta-barrels. HHrepID uses HMM-HMM comparison to exploit evolutionary information in the form of multiple sequence alignments of homologs.
A program for ab initio identification of the tandem repeats. T-REKS is based on clustering of lengths between identical short strings by using a K-means algorithm. T-REKS being linked to the Protein Repeat DataBase opens the way for large-scale analysis of protein tandem repeats. T-REKS can also be applied to the nucleotide sequences.
A de novo motif discovery tool that identifies statistically over-represented motifs in a set of protein sequences, accounting for the evolutionary relationships between them. Motifs are returned with an intuitive P-value that greatly reduces the problem of false positives and is accessible to biologists of all disciplines. Input can be uploaded by the user or extracted directly from UniProt. Numerous masking options give the user great control over the contextual information to be included in the analyses.
A web server for the de novo identification of repeats in protein sequences, which is based on the pairwise comparison of profile hidden Markov models (HMMs). Its main strength is its sensitivity, allowing it to detect highly divergent repeat units in protein sequences whose repeats could as yet only be detected from their structures.