Provides a set of multiple sequence alignments and hidden Markov models (HMMs) for protein families. Pfam is constructed by capturing the diversity of a set of evolutionarily related sequences. It aligns a representative subset of the entire set of matching sequences to build the seed alignment. This database provides more than 16700 entries which are related by similarity of sequence, structure or profile-HMM.


Provides a motif descriptor database. PROSITE offers an annotated collection of biologically meaningful motif descriptors dedicated to the identification of protein families and domains. This database uses two kinds of motif descriptors: (i) patterns or regular expressions in which the most significant residue information is discarded, and (ii) generalized profiles and quantitative motif descriptors that consider the overall similarity on the entire length of domains or proteins.

SPARCLE / Subfamily Protein Architecture Labeling Engine

A resource for the functional characterization and labeling of protein sequences that have been grouped by their characteristic conserved domain architecture. SPARCLE interface proposes to associate domain architectures with functional descriptions. Subfamily domain architectures as defined by the SPARCLE resource vary widely in their coverage and functional diversity. The resolution of this protein classification with respect to specific function depends directly on the availability of specific reagents in the NCBI's Conserved Domain Database (CDD) domain model collection.

ECOD / Evolutionary Classification Of protein Domains

A hierarchical classification of protein domains according to their evolutionary relationships. ECOD classifies proteins with experimentally determined spatial structures from the Protein Data Bank (PDB) database. Compared with other classification databases, such as SCOP and CATH, ECOD emphasizes on distant evolutionary relationships and updates every week to include new structures released in PDB. Due to the delay of PDB FTP release and time needed to classify new domains, ECOD usually falls two to three weeks behind the PDB.


Provides value-added views for UniProt protein entries and Protein Information Resource Super Family (PIRSF) entries with extensive annotation information and graphical displays. iProClass offers a comprehensive integrated view of protein information to facilitate knowledge discovery and to serve as a framework for data integration in a distributed networking environment. The database provides descriptions of proteins, with links to over 50 databases of protein family, function, pathway, interaction, modification, structure, genome, ontology, literature, and taxonomy.


Integrates disparate data sets related to ancient life. LUCapedia is designed for users interested in quickly collecting evidence of deep ancestry for a small number of protein families. Users can query the database for a single protein to collect evidence of its antiquity from a broad range of studies. The database allows to corroborate the results from independent studies, to evaluate early life hypotheses, and to direct future experiments towards understudied areas.

GenDiS / Genomic Distribution of Protein Structural Domain Superfamilies

Provides structural assignments to genes listed within the non-redundant protein sequence database at the superfamily level. GenDiS is a compendium of sequence domains of evolutionarily related proteins grouped at the superfamily level in direct correspondence with SCOP and PASS2 databases. The database allows users to obtain information on taxonomic and superfamily distribution. It also provides links to taxonomy and other protein databases and several tools for performing genome and cross-genome analysis.


A manually curated sequence and amino acid motif database gathering all the different phycobilin lyases and related protein sequences available in public databases. CyanoLyase provides an extensive collection of phycobilin lyases and related proteins, classified in clans, subclans, families and subfamilies. The website also gives access to bioinformatic tools to ease the annotation of these sequences in forthcoming genomes of phycobiliprotein-containing organisms. As such, the website will be updated regularly as new data become available and will therefore be a long-term resource. Users can monitor directly from the web interface the latest changes that have occurred in the database using the corresponding scrolldown menu.


Allows to find sub-family specific residues. CUBE-DB is a database of pre-calculated results which includes visualizations and modifiable spreadsheets. This database serves to detection of functional divergence in human/vertebrate protein families. The CUBE-DB analysis is organized about a nomenclature which is associated with the human proteins, and it is based on all available vertebrate genomes. The result presentation can be elaborated and improved on in several ways, for example by linking dynamically the visualizations for the alignments, the scores, and their mapping onto the structure.

LEAPdb / Late Embryogenesis Abundant Proteins database

Harbors a comprehensive data set for late embryogenesis abundant proteins (LEAP) with tools designed for their online analysis. LEAPdb provides a curated archive of LEAP families to navigate, interpret, and understand this data. The database includes functions to search, describe and analyze LEAP. It can be useful for the compilation and the organization of data concerning LEAP, the classification of the various sub-families, the design of experiments to elucidate the function of this proteins and to help the analysis of the LEAP structure-function relationships.


Provides a platform for the comparative study of Plasmodium genomes. Plasmobase is a database that reports known and new protein domains identified by DAMA and CLADE on the 11 fully sequenced genomes in PlasmoDB. This resource proposes a visualization of domain architectures. It also allows for an easy comparison among architectures within Plasmodium species and all other species in UniProt. A friendly interface permits users to interact with the platform to access new annotations and possibly detect annotation errors.

ProFITS / Protein Families Involved in the Transduction of Signalling

Categorizes transcription factors (TFs), protein kinases/phosphatases (PKs/PPs) and ubiquitin proteasome-system (UPS)-related genes in maize. ProFITS is a database that provides users with a comprehensive profile of genes involved in signal transduction, to facilitate studies on signal transduction in the maize genome. It contains more than 30,000 genes and 50,000 transcripts of the maize genome and more than 50,000 full length cDNA (FLcDNA) sequences. The database also provides several analysis and exploration tools.

AutoPSI / Automated Protein Structure Identification

Provides structure and sequence-based Structural Classification Of Proteins (SCOP) database predictions. AutoPSI is a database of predicted SCOP annotations based on two predictors, Vorolign and AutoSCOP. It aims to bridge the gap between new SCOP releases for structurally resolved proteins in the PDB and yet unclassified by SCOP, as well as to bridge the gap between protein sequences stored in UniProt and known structures in the PDB. The database can assist in clarifying the protein sequence-structure space.

SMoS / Structural Motifs of Superfamilies

Provides important sequence segments. SMoS is a database that integrates the structural features of such motifs for several superfamilies. The provided segments can be projected as the minimum structural requirements for a new member to be considered part of a pre-existing superfamily. Structural motifs provided in this resource have applications in sequence searches, sequence alignments and distant homology modelling. This can also help to rationalize and design mutation experiments in proteins.

MTB-PCDB / Mycobacterium tuberculosis Proteome Comparison Database

Provides integrated access to proteome sequence comparison data. MTB-PCDB is a comprehensive database that help users in easy navigation and retrieval of information for analysis. It includes five strains of Mycobacterium tuberculosis (H37Rv, H37Ra, CDC 1551, F11 and KZN 1435) sequenced completely so far. This information also facilitates design of new antitubercular vaccines and therapeutic agents based on the identified virulence-associated mutations.

PCBOST / Protein Classification Based On Structural Trees

Provides a hierarchically organized database of protein structures. PCBOST is a protein classification based on structural trees. This structural classification of proteins is based on the spatial structural similarity and common folding pathways simulated with the trees. The classification disregards the amino acid sequences, functions, and evolutionary relationships of proteins which are taken into account in other known classifications.

PairsDB / Pairs Database

Offers a way to facilitate the establishment of family relationships between all known protein sequences. PairsDB aims to assign functions to novel proteins and to identify conserved parts in the protein sequences. It allows exploration of protein sequences and their similarity relationships. The database is able to report distant neighbours and display query-anchored stacked alignments. It can be useful for determination of orthologous genes in completely sequenced genomes.