Provides a set of multiple sequence alignments and hidden Markov models (HMMs) for protein families. Pfam is constructed by capturing the diversity of a set of evolutionarily related sequences. It aligns a representative subset of the entire set of matching sequences to build the seed alignment. This database provides more than 16700 entries which are related by similarity of sequence, structure or profile-HMM.
Provides clusters of orthologous groups (COGs) and updated annotation of those COGs. COGs is a database where organisms are sorted according to the NCBI Taxonomy database. Each gene entry in a COG is now denoted by its gene index (gi) number in the NCBI protein database and is linked to the respective entry in the NCBI’s RefSeq database. It concentrates on prokaryotes (bacteria and archaea).
Provides identification and extensive annotation of protein domains, and allows the exploration of protein domain architectures. The SMART database contains manually curated models for more than 1300 protein domains. This interface includes a full text search engine allowing users to identify domains or proteins based on their annotation and other associated text.
Consists in an online resource and protein sequence collection. SUPERFAMILY is both a database and website resource that offers a variety of methods to analyze whole proteins and domains. The database focuses on the SUPERFAMILY level, but additionally provides protein domain assignments at the Family level. The website also provides a server-side pipeline for processing of whole genome protein annotations in a timely manner.
Provides structure-based sequence alignments of protein domain superfamilies in correspondence with Structural Classification of Proteins (SCOP) definitions. PASS2 deals with distantly related members that diversify into multiple Pfam (Protein families) families which include more closely related and reliable set of homologues. The database has a conscious inclusion of single-member superfamilies.
Assists users to classify protein domains based on their evolutionary relationships, considering both remote and close homology. ECOD is an online repository that groups remote homologs that share common ancestors in the same Homology group (H-group) while recognizing fine clustering of close homologs by families (F-group). This database presents domains under different perspectives: functional, structural, and homology-based. Moreover, it simplifies dedicated studies about specific families and protein annotations.
Allows protein sequence analysis and genome annotation. PRINTS-S stores motifs in the form of un-gapped, local sequence alignments. It can model relationships explicitly by defining parent–child and sibling relations.
Provides a motif descriptor database. PROSITE offers an annotated collection of biologically meaningful motif descriptors dedicated to the identification of protein families and domains. This database uses two kinds of motif descriptors: (i) patterns or regular expressions in which the most significant residue information is discarded, and (ii) generalized profiles and quantitative motif descriptors that consider the overall similarity on the entire length of domains or proteins.
A resource consisting of curated multiple sequence alignments, Hidden Markov Models (HMMs) for protein sequence classification, and associated information designed to support automated annotation of (mostly prokaryotic) proteins.
Stores a collection of diagnostic protein family fingerprints. PRINTS is a public domain database. Each fingerprint has been defined and iteratively refined using database scanning procedures within the ADSP or VISTAS sequence analysis packages. User can access data by accession number, PRINTS code, database code, text, sequence, title, number of motifs, author, or by query language.
Provides structural assignments to genes listed within the non-redundant protein sequence database at the superfamily level. GenDiS is a compendium of sequence domains of evolutionarily related proteins grouped at the superfamily level in direct correspondence with SCOP and PASS2 databases. The database allows users to obtain information on taxonomic and superfamily distribution. It also provides links to taxonomy and other protein databases and several tools for performing genome and cross-genome analysis.
A resource for the functional characterization and labeling of protein sequences that have been grouped by their characteristic conserved domain architecture. SPARCLE interface proposes to associate domain architectures with functional descriptions. Subfamily domain architectures as defined by the SPARCLE resource vary widely in their coverage and functional diversity. The resolution of this protein classification with respect to specific function depends directly on the availability of specific reagents in the NCBI's Conserved Domain Database (CDD) domain model collection.
Provides structure and sequence-based Structural Classification Of Proteins (SCOP) database predictions. AutoPSI is a database of predicted SCOP annotations based on two predictors, Vorolign and AutoSCOP. It aims to bridge the gap between new SCOP releases for structurally resolved proteins in the PDB and yet unclassified by SCOP, as well as to bridge the gap between protein sequences stored in UniProt and known structures in the PDB. The database can assist in clarifying the protein sequence-structure space.
A manually curated sequence and amino acid motif database gathering all the different phycobilin lyases and related protein sequences available in public databases. CyanoLyase provides an extensive collection of phycobilin lyases and related proteins, classified in clans, subclans, families and subfamilies. The website also gives access to bioinformatic tools to ease the annotation of these sequences in forthcoming genomes of phycobiliprotein-containing organisms. As such, the website will be updated regularly as new data become available and will therefore be a long-term resource. Users can monitor directly from the web interface the latest changes that have occurred in the database using the corresponding scrolldown menu.
Arranges the data being obtained by means of in vitro protein evolution into a structured and easily searchable array. ASPD aims to facilitate the work of researchers carrying out new work in molecular biology. It can assist in protein annotation and structure-function analysis. This database allows users to submit their in vitro selection experiments results. It permits definition of the output representation from a query.
Consists in a data structure of protein families. ProtoNet aims to achieve an automatic hierarchical clustering of the protein sequences space. The database generates automatically, with no supervision a consistent classification tree. Additional tools for analysis and visualization allows to enhance the user’s navigation options through the ProtoNet tree. A “simplified” and an “advanced” mode are both available.
Allows to find sub-family specific residues. CUBE-DB is a database of pre-calculated results which includes visualizations and modifiable spreadsheets. This database serves to detection of functional divergence in human/vertebrate protein families. The CUBE-DB analysis is organized about a nomenclature which is associated with the human proteins, and it is based on all available vertebrate genomes. The result presentation can be elaborated and improved on in several ways, for example by linking dynamically the visualizations for the alignments, the scores, and their mapping onto the structure.