Provides a set of multiple sequence alignments and hidden Markov models (HMMs) for protein families. Pfam is constructed by capturing the diversity of a set of evolutionarily related sequences. It aligns a representative subset of the entire set of matching sequences to build the seed alignment. This database provides more than 16700 entries which are related by similarity of sequence, structure or profile-HMM.
Provides clusters of orthologous groups (COGs) and updated annotation of those COGs. COGs is a database where organisms are sorted according to the NCBI Taxonomy database. Each gene entry in a COG is now denoted by its gene index (gi) number in the NCBI protein database and is linked to the respective entry in the NCBI’s RefSeq database. It concentrates on prokaryotes (bacteria and archaea).
Provides identification and extensive annotation of protein domains, and allows the exploration of protein domain architectures. The SMART database contains manually curated models for more than 1300 protein domains. This interface includes a full text search engine allowing users to identify domains or proteins based on their annotation and other associated text.
Consists in an online resource and protein sequence collection. SUPERFAMILY is both a database and website resource that offers a variety of methods to analyze whole proteins and domains. The database focuses on the SUPERFAMILY level, but additionally provides protein domain assignments at the Family level. The website also provides a server-side pipeline for processing of whole genome protein annotations in a timely manner.
Provides structure-based sequence alignments of protein domain superfamilies in correspondence with Structural Classification of Proteins (SCOP) definitions. PASS2 deals with distantly related members that diversify into multiple Pfam (Protein families) families which include more closely related and reliable set of homologues. The database has a conscious inclusion of single-member superfamilies.
Assists users to classify protein domains based on their evolutionary relationships, considering both remote and close homology. ECOD is an online repository that groups remote homologs that share common ancestors in the same Homology group (H-group) while recognizing fine clustering of close homologs by families (F-group). This database presents domains under different perspectives: functional, structural, and homology-based. Moreover, it simplifies dedicated studies about specific families and protein annotations.
Allows protein sequence analysis and genome annotation. PRINTS-S stores motifs in the form of un-gapped, local sequence alignments. It can model relationships explicitly by defining parent–child and sibling relations.
Provides a motif descriptor database. PROSITE offers an annotated collection of biologically meaningful motif descriptors dedicated to the identification of protein families and domains. This database uses two kinds of motif descriptors: (i) patterns or regular expressions in which the most significant residue information is discarded, and (ii) generalized profiles and quantitative motif descriptors that consider the overall similarity on the entire length of domains or proteins.
A resource consisting of curated multiple sequence alignments, Hidden Markov Models (HMMs) for protein sequence classification, and associated information designed to support automated annotation of (mostly prokaryotic) proteins.
Stores a collection of diagnostic protein family fingerprints. PRINTS is a public domain database. Each fingerprint has been defined and iteratively refined using database scanning procedures within the ADSP or VISTAS sequence analysis packages. User can access data by accession number, PRINTS code, database code, text, sequence, title, number of motifs, author, or by query language.
Provides structural assignments to genes listed within the non-redundant protein sequence database at the superfamily level. GenDiS is a compendium of sequence domains of evolutionarily related proteins grouped at the superfamily level in direct correspondence with SCOP and PASS2 databases. The database allows users to obtain information on taxonomic and superfamily distribution. It also provides links to taxonomy and other protein databases and several tools for performing genome and cross-genome analysis.
Provides structure and sequence-based Structural Classification Of Proteins (SCOP) database predictions. AutoPSI is a database of predicted SCOP annotations based on two predictors, Vorolign and AutoSCOP. It aims to bridge the gap between new SCOP releases for structurally resolved proteins in the PDB and yet unclassified by SCOP, as well as to bridge the gap between protein sequences stored in UniProt and known structures in the PDB. The database can assist in clarifying the protein sequence-structure space.
A resource for the functional characterization and labeling of protein sequences that have been grouped by their characteristic conserved domain architecture. SPARCLE interface proposes to associate domain architectures with functional descriptions. Subfamily domain architectures as defined by the SPARCLE resource vary widely in their coverage and functional diversity. The resolution of this protein classification with respect to specific function depends directly on the availability of specific reagents in the NCBI's Conserved Domain Database (CDD) domain model collection.
A manually curated sequence and amino acid motif database gathering all the different phycobilin lyases and related protein sequences available in public databases. CyanoLyase provides an extensive collection of phycobilin lyases and related proteins, classified in clans, subclans, families and subfamilies. The website also gives access to bioinformatic tools to ease the annotation of these sequences in forthcoming genomes of phycobiliprotein-containing organisms. As such, the website will be updated regularly as new data become available and will therefore be a long-term resource. Users can monitor directly from the web interface the latest changes that have occurred in the database using the corresponding scrolldown menu.
Consists in a data structure of protein families. ProtoNet aims to achieve an automatic hierarchical clustering of the protein sequences space. The database generates automatically, with no supervision a consistent classification tree. Additional tools for analysis and visualization allows to enhance the user’s navigation options through the ProtoNet tree. A “simplified” and an “advanced” mode are both available.
Arranges the data being obtained by means of in vitro protein evolution into a structured and easily searchable array. ASPD aims to facilitate the work of researchers carrying out new work in molecular biology. It can assist in protein annotation and structure-function analysis. This database allows users to submit their in vitro selection experiments results. It permits definition of the output representation from a query.
Allows to find sub-family specific residues. CUBE-DB is a database of pre-calculated results which includes visualizations and modifiable spreadsheets. This database serves to detection of functional divergence in human/vertebrate protein families. The CUBE-DB analysis is organized about a nomenclature which is associated with the human proteins, and it is based on all available vertebrate genomes. The result presentation can be elaborated and improved on in several ways, for example by linking dynamically the visualizations for the alignments, the scores, and their mapping onto the structure.
Aims to order proteins that function in the eye into homologous families and sequence clusters. EyeSite assists users in investigation of the relationship between protein structure and the underlying causes of disease. It gathers a list that shows the generic family names of the sequences found in the tissue of the eye. This database is searchable by tissue type, protein family and species and offers a web interface.
Collects biological units (BUs) and asymmetric units (ASUs) from both Protein Quaternary Server and Protein Data Bank. ProtBut is a repository that focuses on assisting users in locating template structures for homology modeling with specific contents at the level of the BU of structure. It allows users to retrieve BUs and ASUs by PDB entries and to make comparisons between BUs according to various features such as chain composition or interfaces.
Stores best representative profiles (BRP) of protein families. 3PFDB is a database designed to find the best representative sequence (BRS) for each PFAM family. Users can also search new sequences against the representative profiles using two sequence homology detection methods, HMMER and FASSM. This approach was tested for over 100-family dataset.
Harbors a comprehensive data set for late embryogenesis abundant proteins (LEAP) with tools designed for their online analysis. LEAPdb provides a curated archive of LEAP families to navigate, interpret, and understand this data. The database includes functions to search, describe and analyze LEAP. It can be useful for the compilation and the organization of data concerning LEAP, the classification of the various sub-families, the design of experiments to elucidate the function of this proteins and to help the analysis of the LEAP structure-function relationships.
Categorizes transcription factors (TFs), protein kinases/phosphatases (PKs/PPs) and ubiquitin proteasome-system (UPS)-related genes in maize. ProFITS is a database that provides users with a comprehensive profile of genes involved in signal transduction, to facilitate studies on signal transduction in the maize genome. It contains more than 30,000 genes and 50,000 transcripts of the maize genome and more than 50,000 full length cDNA (FLcDNA) sequences. The database also provides several analysis and exploration tools.
Provides known and new protein domains identified by Co-Occurrence Domain Detection (CODD) on several major human pathogens selected from EupathDB database. EuPathDomains can be queried by protein names, domain identifiers, Pfam or Interpro identifiers. It offers users the possibility to limit the search on an organism or a taxon. This database improves the domain coverage in all genomes, by localizing new occurrences of domains that are already known.
Provides a hierarchically organized database of protein structures. PCBOST is a protein classification based on structural trees. This structural classification of proteins is based on the spatial structural similarity and common folding pathways simulated with the trees. The classification disregards the amino acid sequences, functions, and evolutionary relationships of proteins which are taken into account in other known classifications.
Provides a platform for the comparative study of Plasmodium genomes. Plasmobase is a database that reports known and new protein domains identified by DAMA and CLADE on the 11 fully sequenced genomes in PlasmoDB. This resource proposes a visualization of domain architectures. It also allows for an easy comparison among architectures within Plasmodium species and all other species in UniProt. A friendly interface permits users to interact with the platform to access new annotations and possibly detect annotation errors.
Provides value-added views for UniProt protein entries and Protein Information Resource Super Family (PIRSF) entries with extensive annotation information and graphical displays. iProClass offers a comprehensive integrated view of protein information to facilitate knowledge discovery and to serve as a framework for data integration in a distributed networking environment. The database provides descriptions of proteins, with links to over 50 databases of protein family, function, pathway, interaction, modification, structure, genome, ontology, literature, and taxonomy.
Provides integrated access to proteome sequence comparison data. MTB-PCDB is a comprehensive database that help users in easy navigation and retrieval of information for analysis. It includes five strains of Mycobacterium tuberculosis (H37Rv, H37Ra, CDC 1551, F11 and KZN 1435) sequenced completely so far. This information also facilitates design of new antitubercular vaccines and therapeutic agents based on the identified virulence-associated mutations.
Provides predicted domain gene ontology (GO) annotations. SDADB is an online resource that integrates structural neighborhood features together with a variety of heterogeneous information, including SCOPInterPro domain mapping information, PSSMs and sequence homolog features. This database can be queried through the protein/domain accession number or protein/domain name. It also gives valuable insights into protein domain evolution.
Provides important sequence segments. SMoS is a database that integrates the structural features of such motifs for several superfamilies. The provided segments can be projected as the minimum structural requirements for a new member to be considered part of a pre-existing superfamily. Structural motifs provided in this resource have applications in sequence searches, sequence alignments and distant homology modelling. This can also help to rationalize and design mutation experiments in proteins.
Provides information about gene families. FUnkFams supplies information to detect protein families without annotation domains and identifies them in metagenomics data. It also assigns annotations to all sequences in each family using one or more annotation databases. It can record the taxonomy of the organism from which each sequence is derived.
A database and resource of protein families in Arthropod genomes. ProtoBug platform presents the relatedness of complete proteomes from 17 insects as well as a proteome of the crustacean, Daphnia pulex. The represented proteomes from insects include louse, bee, beetle, ants, flies and mosquitoes.
Provides increased protein family coverage based on the PRINTS database. PrePRINTS is an automatically generated database that contains conserved motifs used to characterise a protein family and fingerprints which may be used for sequence diagnosis. PrePRINTS is constructed by aligning sequences extracted from PRODOM, a complete set of automatically generated protein domain families from the UniProt knowledge base.
A database of homolog groups in cyanobacteria and plastids that are produced by the program Gclust. CyanoClust contains protein homology information for 38 cyanobacteria, 59 plastids and 1 Paulinella chromatophore as well as 10 bacteria that are clustered by the Gclust software. Major conserved clusters are annotated semi-automatically. Vertual Linkage roup (VLG) is also indicated.
Aids function association in genome analysis by remote homology detection. SUPFAM is a database that includes sequence families of yet unknown structure in a known superfamily of known structures. This online resource recognises and documents related integral membrane protein families. It also groups sequence families, without a detectable relationship with a family of known structure, into new potential superfamilies.
Provides a collection of over 100 000 protein families. FIGfams are sets of isofunctional homologues. Each one contains a set of proteins that are end-to-end homologous and share a common function and can be define as a fourtuple: ID, protein-set, decision-procedure and family-function. Results from the literature are incorporated into the FIGfams via Subsystem curation, guaranteeing that the FIGfams remain up to date.
Integrates disparate data sets related to ancient life. LUCapedia is designed for users interested in quickly collecting evidence of deep ancestry for a small number of protein families. Users can query the database for a single protein to collect evidence of its antiquity from a broad range of studies. The database allows to corroborate the results from independent studies, to evaluate early life hypotheses, and to direct future experiments towards understudied areas.
Offers a way to facilitate the establishment of family relationships between all known protein sequences. PairsDB aims to assign functions to novel proteins and to identify conserved parts in the protein sequences. It allows exploration of protein sequences and their similarity relationships. The database is able to report distant neighbours and display query-anchored stacked alignments. It can be useful for determination of orthologous genes in completely sequenced genomes.
Introduces and applies a novel algorithm for quantitative functional comparison between domains. ELISA is a database created to solve a long-standing problem in the domain evolution community of consolidation of different types of comparisons for a single dataset. This method can be used for both studies in evolution under different types of pressures such as structure, function and phylogeny and for prediction.
Allows comparative analysis of protein lengths in completely sequenced prokaryotic and eukaryotic genomes. PPD enables the classification of proteins into groups based on their lengths, calculation of the percentage change (reduction or increase) in proteins of a specific length among the chosen organisms, exploration of the resulting classification and visualization of length distribution profiles online and amino acid composition profiles online. It also provides links to COG and KEGG. The main page supplies a list of all the available genomes.
Dr. Yashwanth Subbannayya obtained his M.Sc. degree in Medical Biochemistry from Manipal University. He qualified the competitive CSIR-UGC National Eligibility Test and joined the Institute of Bioinformatics, Bangalore as a UGC Junior Research Fellow. As part of his Ph.D. work, he studied the molecular mechanisms of gastric cancer in clinical specimens using quantitative proteomic technologies. This study, the results of which were published in Cancer Biology and Therapy, yielded a novel therapeutic target for gastric cancer- CAMKK2. Further, he also studied the serum proteome of gastric cancer patients and developed assays for potential markers using the revolutionary multiple reaction monitoring approach. The results of this study were published in Journal of Proteomics. In addition to his research work, he also trained extensively in sample preparation for mass spectrometry, fractionation techniques and gained expertise in quantitative proteomic techniques and data analysis. In addition, he also trained extensively in various validation platforms including immunohistochemsitry, multiple reaction monitoring and Western blot. He has also worked as a curator for several biological databases including NetPath, Human Protein Reference Database (HPRD) and Breast cancer database. His work in various research projects have yielded him 23 publications either as lead author or co-author in peer reviewed journals. He is a reviewer for the journal Proteomics.
Dr. Yashwanth Subbannayya joined the YU-IOB Center for Systems Biology and Molecular Medicine in June, 2015. During the initial period, his job consisted of assisting other personnel of the university in the establishment of YU-IOB Center for Systems Biology and Molecular Medicine. He was also involved in training of Ph.D. students in biological aspects. After the establishment of the center, he trained in cell culture techniques and metabolomics analysis. At YU-IOB CSBMM, he is studying the molecular mechanisms in various cancers including oral cancer. In addition, he is studying the molecular mechanisms as well as the metabolic constituents of traditional medicine formulations using mass spectrometry technologies. In June 2016, he convened the national symposium “Genomics in clinical practice: Future of precision medicine” held at Yenepoya University on June 1 and 2, 2016. The resource persons included 16 individuals from various academic organizations as well as industry. The symposium was attended by 218 participants from 24 institutions around India. He is a member of the Scientific Review Board of Yenepoya Research Centre where he facilitates timely scientific review of research projects.
Sangram keshari sahu Computational Biology and Bioinformatics
Indian Institute of Science Education and Research Mohali
A bioinformatics graduate. Junior Research Fellow at IISER Mohali doing a project in Computational Biology.