Supplies an access to several biological data resources and bioinformatics services. EBI is a platform that covers the entire range of biological sciences: raw DNA sequences to curated proteins, chemicals, structures, systems, pathways, ontologies and literature. Databases, tools, as well as web services are provided for sharing data, performing queries and analyzing results. Users can also deposit their data through a data submission page. All the resources are freely available without restriction, with few exceptions.
Offers a seamless integration of and navigation through protein-related data. NeXtProt contains proteomics data for over 85% of human proteins. Moreover, this tool includes over 8000 phenotypic observations for over 4000 variations in a number of genes involved in hereditary cancers and channelopathies. All of the data are available via a user interface and FTP site. An API access and a SPARQL endpoint are also provided for more technical applications.
A centralized, standards compliant, public data repository for proteomics data, including protein and peptide identifications, post-translational modifications and supporting spectral evidence. PRIDE is a core member in the ProteomeXchange (PX) consortium, which provides a single point for submitting mass spectrometry based proteomics data to public-domain repositories. Datasets are submitted to PRIDE via ProteomeXchange and are handled by expert biocurators.
Permits to access, discover and disseminate omics data sets. OmicsDI is an open-source platform that can integrate proteomics, genomics, metabolomics and transcriptomics data sets. This platform stores biological and technical metadata from these public data sets using an efficient indexing system that can integrate different biological entities, including genes, transcripts, proteins, metabolites and the corresponding publications from PubMed.
Enables navigation of proteomes, provides biological insight and fosters the development of proteomic technology. ProteomicsDB is a mass-spectrometry-based draft of the human proteome and a public, high-performance, in-memory database for real-time analysis of terabytes of big data. It contains 81,721 unique phosphorylated peptides representing 11,025 human genes, demonstrating that more than half of all human proteins are substrates of kinases.
An atlas of protein expression of Medicago truncatula in association with Sinorhizobium meliloti. The Medicago Proteome Atlas provides evidence for more than 23013 protein groups (19679 from the eukaryotic host plant, M. truncatula; 3334 from S. meliloti) along with 20120 phosphorylation sites and 734 lysine acetylation sites. Further mining of this proteomic resource may enable engineering of crops and their microbial partners to increase agricultural productivity and sustainability.
A web-based resource to aid analysis of existing biological data and inspire future biological investigations. Phosphomouse presents experimental data about tissue-specific protein abundance and phosphorylation, including 12000 proteins and 36000 phosphorylation sites from 9 mouse tissues. These data revealed distinctive and complementary protein and phosphoprotein expression profiles that support each tissue’s unique physiology. Moreover, by combining protein abundance measurements with phosphorylation observations, we could distinguish tissue-specific phosphorylation of ubiquitous proteins from phosphorylation of tissue-specific proteins.
Translates the human proteome into molecular and digital tools for drug discovery, personalized medicine and life science research. The ProteomeTools project is a joint effort of the Technical University of Munich (TUM), JPT Peptide Technologies, SAP SE and Thermo Fisher Scientific. It aims to use synthetic reference peptides to create reference mass spectra and covers all human proteins, important post-translational modifications thereof and other interesting biology such as disease associated mutations, HLA neo antigens, small open reading frames (ORFs) or translated lincRNAs.
Assists users to analyze, display and share results from proteomics projects. EPD is an online repository giving access to data generated by the Lamond group. It contains information from multiple, complex data sets and quantitative studies on human cells and model organisms. Moreover, the results can be visualized in the form of interactive graphs and plots.
Provides data about the model organism Paramecium tetraurelia. ParameciumDB was created by using components of the Generic Model Organism Database (GMOD). It offers data about gene expression data from genome-wide transcriptome experiments using a NimbleGen custom microarray platform. The database can be consulted through a detailed view that provides protein alignments and links to external databases.
Compiles properties of more than 53 000 characterized mammalian proteins. HumanPSD provides association of human proteins with diseases and their potential utilization as biomarkers. It reports drugs targeting proteins for human. All information is classified by molecular functions, biological roles, localization, and modifications of proteins, expression patterns across cells, tissues, organs, and tumors.
Contains several body fluid proteomes, including plasma, urine, and cerebrospinal fluid. MAPU consists of several sub-databases containing different proteomes. This resource is organized into four branches: body fluids, tissues, cell types, and organelles. Some of these branches contain a clickable map to access the relevant proteomes. It also estimates a P-value for the identification for each protein where possible.
A database of physicochemical and structural properties, and novel functional region in plant proteomes. Plant-PrAS database plant species are Arabidopsis, soybean, poplar, rice, moss and algae. We carried out the calculation and prediction of physicochemical parameters (Length, Charged, Nonpolar, Acidic, Basic, Low complexity, GRAVY and pI), secondary structural properties (Solvent accessibility, β sheet, Intrinsically disordered regions, Signal peptide cleavage sites, Transmembrane helices, S-S bond and Domain linker), functional annotation (Pfam, Uniprot-plant, Uniprot-sprot, EC number, PDB and KOG), functional region (PASS and Rosetta stone proteins) and others (Ubiquitylation site, N-glycosylation site, O-glycosylation and Subcellular location, Protein solubility).
Gathers information about bitopic proteins from six complete genomes (Homo sapiens, Arabidopsis thaliana, Dictyostelium discoideum, Saccharomyces cerevisiae, Escherichia coli and Methanocaldococcus jannaschii), corresponding to each kingdom. Membranome is a database which compiles 3D models of transmembrane (TM) domains, organized following a customized classification, for over 6000 bitopic proteins accompanied by their related structural and functional information.
Helps to study molecular or phenotypic effects of yet uncharacterized proteins. FlyDev is a database and a visualization tool that provides high resolution proteomes of Drosophila melanogaster. Results were presented as a treemap or a scatterplot of terms clustered based on the first 2 components of a principal component analysis (PCA) of the information content (IC) similarity scores.
Provides multi-omic data about inflammatory bowel disease (IBD). IBDMDB considers the gut microbial ecosystem as a target for diagnosis, therapy, and mechanistic understanding of IBD. It can be used to estimate if microbial composition predicts subsequent risk of flares in disease activity. This platform is useful for determining the response to a given therapy using the stool microbiota.
Provides 'proteome' sets of proteins thought to be expressed by organisms whose genomes have been completely sequenced. UniProt proteomes is a database that gives access to “Reference proteomes”, which are a well annotated proteomics for model organisms and organisms of interest for biomedical research and phylogeny. It may include both manually reviewed (UniProtKB/Swiss-Prot) and unreviewed (UniProtKB/TrEMBL) entries.
Aims to collate any relevant data pertaining to any PE2-4 protein. Missing ProteinPedia permits to define, summarize and discuss all available data for the so-called missing proteins, emphasizing why they may be currently difficult to observe/find, using standard proteomics mass spectrometry (MS) and Ab-based techniques. It allows the generation of high confidence MS evidence for as many PE2-4 proteins as possible.
Generates human induced pluripotent stem cells (iPSCs) from hundreds of healthy individuals as well as patients diagnosed with selected diseases. HipSci is a powerful resource to evaluate and quantify cell responses to chemical, physical and biological stimuli using novel assays and artificial microenvironments. Within this framework, phenotypic data are being collated with genomics, epigenomics and proteomics data to discover the impact of their variation on the cellular phenotype.
A repository built to house phosphoprotein, phosphopeptide, and phosphosite data specific to Medicago. Medicago PhosphoProtein Database holds 3457 unique phosphopeptides that contain 3404 non-redundant sites of phosphorylation on 829 proteins. Through the web-based interface, users are allowed to browse identified proteins or search for proteins of interest. Furthermore, it allows users to conduct BLAST searches of the database using both peptide sequences and phosphorylation motifs as queries. The data contained within the database are available for download to be investigated at the user’s discretion.
Displays Rab annotation for all genomes available as a part of Superfamily 1.75. RabDB a database created to explore the universe of the Rab family of small GTPases, key regulators of the Eukaryotic endomembrane system, predicted by the Rabifier classification pipleline in the sequenced eukaryotic genomes. It is designed to enable the cell biology community to keep pace with the increasing number of fully-sequenced genomes and change the scale at which we perform comparative analysis in cell biology.
Contains some information from post-genomic experiments that use the model bacterium Escherichia coli K12. EchoBASE allows consultation of protein–protein interaction (PPI) data, structural data and bioinformatics studies, proteomics studies, and microarray data. This resource provides useful information to predict biological functions for uncharacterized gene products. It enables manipulation of data from genome-wide experiments.
Provides a resource of protein phosphorylation data from multiple plants. With the large-scale phosphorylation data and associated web-based tools, P3DB will be a valuable resource for both plant and nonplant biologists in the field of protein phosphorylation.
Collects information about the role of synapse proteins in physiology and disease. G2Cdb intends to centralize warehousing data on the synaptic proteome. The database includes mouse and human genomic annotation resources. Users can rapidly determine if a particular gene is found in brain-signalling complex has been altered and studied in experimental paradigms of learning and memory.
An integrated web-based resource that catalogues the genomic and proteomic annotations identified in colorectal cancer (CRC) tissues and cell lines. The data catalogued to-date include sequence variations as well as quantitative and non-quantitative protein expression data. The database enables the analysis of these data in the context of signaling pathways, protein–protein interactions, Gene Ontology terms, protein domains and post-translational modifications. Currently, Colorectal Cancer Atlas contains data for >13 711 CRC tissues, >165 CRC cell lines, 62 251 protein identifications, >8.3 million MS/MS spectra, >18 410 genes with sequence variations (404 278 entries) and 351 pathways with sequence variants. Overall, Colorectal Cancer Atlas has been designed to serve as a central resource to facilitate research in CRC.
Contains metabolome and proteome data in plasma obtained from 5,093 healthy volunteers in a Japanese population. jMorp delivers minimized biases due the utilization of a single protocol in a single institute, the Tohoku Medical Megabank Cohort Study. It offers a graphical viewer that allows to display correlations between metabolites. This database is built using large-scale cohort data for healthy volunteers with various health records and genome data, and provides significant genome wide association study (GWAS) results.
Consists in a compendium of endogenously tagged human proteins and their time-lapse microscopy movies. Dynamic Proteomics provides the annotation of the tagged proteins, alignment of protein dynamics for proteins of interest, sequence search and comparison of up to 50 input sequences to all the complementary DNAs (cDNAs) in the library. It offers a search for gene names, DNA sequences, protein description, image or published localization and exon-tag insertion point.
Provides an overview of the diversity of many sub-nuclear compartments. The NPD is a curated database and all information is supported by links to published material. This tool provides a resource for researchers as well as a gateway for students to explore the complexity of the mammalian nucleus. More than 1300 vertebrate nuclear proteins reported in the literature have also been archived in NPD.
Provides access to the results of computer-assisted sequence analysis of mouse homologues of KIAA cDNA (mKIAA cDNA) that were isolated. ROUGE is a subsidiary database of the Human Unidentied Gene-Encoded (HUGE) protein database that contains about 1000 mKIAA cDNA entries. The two databases have the same basic organization, with a gene/ protein characteristic table, summarizing the results from computer-assisted analysis of the cDNA sequence and the deduced amino acid sequences, for each cDNA entry.
Complements the Genome Browser in covering the proteome world. UCSC Proteome Browser provides a wealth of protein information presented in the form of graphical images of tracks, histograms and links to other Internet sites. This resource offers an option to automatically generate either Postscript or PDF format files of the images to support the publication and presentation needs of users.
Concerns biochemically characterized proteins data. CharProtDB provides a source of transitive assignments of function which allow to make annotation pipelines. This annotation contains (1) gene name, (2) symbol and various controlled vocabulary terms, (3) Enzyme Commission number, (4) TransportDB accession. A BLAST sequence similarity search has been provided from the CharProtDB web interface, which permits user input and can search the user which submitted query sequence against the entire CharProtDB data set.
Provides access to lipid-associated protein sequences and annotations. LMPD is a database that stores more than 8.000 genes and 12.000 proteins from several species, including human. Users can browse the protein list, with an option to browse by associated lipid category. The ‘Advanced’ query form includes options to search by database ID or keyword and to filter by species and/or lipid class association.
Compiles information about soybean functional analysis. Soybean Proteome Database gathers proteome data collected from plants in flooding stress conditions. It provides information about proteins identified on 2-DE maps of proteins extracted from a wide variety of tissues and subcellular compartments of soybean. The database is particularly focused on the developmental stage occurring 0-7 days after seedling emergence.
Stores and applies customizable sets of naming rules to correct and standardize gene and protein names within an annotated genome or metagenome. PNU is a web-based database that allows users to create and maintain their own naming rules and organize these rules in projects that can be shared with the community. It can help relieve researchers from extensive manual curation of their genomes.
A literature based manually curated protein centric database of rice proteins. MCDRP provides experimental data embedded in published articles in a computer searchable format. The database has data for over 1800 rice proteins curated from >4000 different experiments of over 400 research articles. It also has protein–protein interaction data for 199 rice proteins as well as DNA–protein interaction data for 51 rice proteins.
Provides convenient and interactive search tools allowing users to retrieve, to analyze and also to predict mobile RNAs/proteins. Each entry in the PlaMoM database contains detailed information such as nucleotide/amino acid sequences, ortholog partners, related experiments, gene functions and literature. The resource provides a built-in tool to identify potential RNA mobility signals such as tRNA-like structures. The current version of PlaMoM compiles a total of 17 991 mobile macromolecules from 14 plant species/ecotypes from published data and literature.
Gathers information about proteomes and two-dimensional gels. DynaProt 2D is a database that allows users to realize a research by indicating different characteristics: gene, protein, PID or 2D gel. Its interface consists of three parts: the search including the result-set, the interactive reference maps and the summarized protein information table.
Compiles a collection of experimentally observed translated genomic elements (TGEs) derived from proteomics informed by transcriptomics (PIT) experiments. PITDB includes a record for each TGE accompanied by information about evidence at the mRNA and peptide level and metadata about the corresponding observed sample(s). Each entry can be visited individually or by experiment.
Gathers multiple reaction monitoring (MRM)-based targeted proteomics assay data from PASSEL, CPTAC, PanoramaWeb, SRMAtlas and PeptideTracker. MRMAssayDB is a web-based resource allowing accumulation of available targeted proteomics assays in the community. The information is integrated and annotated with additional information on the involvement of biological pathways, protein-protein interaction, Gene Ontology terms, known post-translational modifications (PTMs) and disease-associated mutations, disease involvement. Users can perform basic and advanced search queries and filter and sort the search results.
Provides a platform for analyzing variation. The HUMA interface includes tools to analyze variation in protein sequence and structures. It contains information about sequence, structure, variation and disease. It aggregates data from various sources such as UniProt or Ensembl. It gives users the ability to upload their own private datasets.
Offers complete nonredundant data sets representing the human, mouse and rat proteomes, built from the Swiss-Prot, TrEMBL, Ensembl and RefSeq databases. IPI is a nonredundant human proteome set that was used in the primary analysis of the human genome sequence. It provides a species-specific, complete and non-redundant dataset particularly suited to supporting protein identification in proteomics experiments. Its sequence- and identifier-based construction eliminates the need for manual filtering of redundant results in protein identification, while maintaining cross-references to the source data.
Provides a repository of bacterial eukaryotic proteomes. PQI collects up to 3200 proteomes and allows users to browse, filter and download data. This database intends to facilitate measurement of proteome quality via a 5-star rating system supported by 11 different metrics of quality. Each proteome includes information such as sequencing technology used, publication count, and numerous automated scoring metrics based on protein composition and phylogenetic placement.
Provides a repository of structure and function annotations on the 'missing proteins' of the human proteome. HPSF hosts missing proteins that have not been validated at protein level which are first extracted from the neXtProt database. The structure folding simulations are then generated by I-TASSER with all homologous templates excluded from the threading libraries. Finally, the functional insights of each protein, including enzyme commission, gene ontology, ligand-binding and subcellular location, are provided by the structure-based function annotation tool, COFACTOR. One goal of the HPSF database is to construct a comprehensive repository consisting of annotations on the folding and function of all missing proteins in human proteome using the cutting-edge bioinformatics methods, which should provide important help to recognize possible protein-coding genes from the 'missing proteins' and to guide further protein characterization experiments.
Provides terminal tags of proteomes. ProteinCarta is a database storing over 50 residues from both termini of all amino acid sequences in the UniProt reference proteome data of the nine organisms analyzed. It requires only the amino acid sequence and the organism name as the input information. The search mode is useful, especially for identifying isoforms where the alternative sequences are located in the terminal regions.
Provides a centralized repository for published datasets. CrossCheck allows users to compare a user-defined list of gene symbols with the software reference database. The database contains over 600000 screen hits from published high-throughput screen datasets and low-throughput published information deposited into NCBI databases, as well as a novel predicted protein kinase substrate database.