Amino acid repeat databases | Protein sequence data analysis
A large portion of proteins contain repetitive motifs, which are generated by internal duplications and frequently correspond to structural and functional units of proteins. Many repetitions in protein sequences can be identified by using different approaches. Several repeat families have been studied so far due to their relevance in different biological processes such as health, neurodevelopment and protein engineering, to name just a few. An open question regarding repeat proteins is the existence of other common structures that may have gone undetected. After all, the most common way to detect repeat families so far was to manually annotate the sequence family first and only afterwards visually recognize their structural repetitiveness. Such an approach is obviously difficult when dealing with the entire Protein Data Bank (PDB), especially considering the many uncharacterized protein structures deposited by the main structural genomics consortia. The systematic description of repeat structures becomes a question of using automated methods to detect them in protein structures.
Analyzes data from thousands of prokaryotic genomes in order to understand what drives the evolution and diversity of this superfamily. The Prokaryotic AARS Database is an online resource that was develop for the rapid and sensitive detection of AARS proteins encoded in genome sequences. It helps to identify organisms with alternative pathways that are involved in maintaining the fidelity of the genetic code.
A systematic attempt to document the biochemical and biophysical properties of proteins from halophilic archaea/bacteria which may be involved in adaptation of these organisms to saline conditions. In this database, various physicochemical properties such as molecular weight, theoretical pI, amino acid composition, atomic composition, estimated half-life, instability index, aliphatic index and grand average of hydropathicity (Gravy) have been listed.
Supplies a set of all currently known human polyQ repeat-containing proteins. PolyQ provides basic information for each entry, it lacks in both depth and breadth of annotation as well as functionality. It contains a variety of structural and functional annotations, such as polyQ protein disease models in mouse, protein 3D structure, Pfam domain, post-translational modification (PTM) sites, single point mutations and complementary protein annotations, and also covers domain context of polyQ repeats.
Aids to choosing optimal conditions for expression, purification and characterization of a cyanobacterial protein. CyanoPhyChe is a collection of the calculated physicochemical properties, solubility, and probability of an expressed protein entering into an inclusion body, structural stability, polarity and secondary structure of all cyanobacterial proteins. User can also export the physicochemical properties, predicted secondary structure, amino acid sequence and amino acid composition of selected cyanobacterial proteins for further analysis.
Provides annotated tandem repeat protein structures. RepeatsDB includes high quality annotations for ∼5400 protein structures. RepeatsDB features information on start and end positions for the repeat regions and units for all entries. The extensive growth of repeat unit characterization was possible by applying the ReUPred annotation method over the entire Protein Data Bank. The data quality is guaranteed by a manual validation for more than 60% of the entries. The updated web interface includes a search engine for complex queries and a fully re-designed entry page for a better overview of structural data. It is possible to compare unit positions, together with secondary structure, fold information and Pfam domains.
Concerns protein repeat sequences. ProRepeat is an online library gathering the corresponding nucleotide sequences of the repeat fragments for the purpose of codon usage analysis. It provides the means for the exploration of function and evolution of protein repeats. It contains repeats from over 80 complete sequenced eukaryotic proteomes including 14 vertebrates, 8 plants, 22 fungi, 12 insects and 29 other organisms.
A database of amino acid repeat-containing proteins found in lower eukaryotic pathogens. The RepSeq database is accessed via a web-based application which also provides links to related online tools and databases for further analyses. The database allows for both individual and cross-species proteome analyses and also allows users to upload sequences of interest for analysis by the RepSeq algorithm.