SaRAD / Simple and Robust Abbreviation Dictionary
Provides an easy to implement, high performance tool for the construction of a biomedical symbol dictionary. The algorithms, applied to the MEDLINE document set, result in a high quality dictionary and toolset to disambiguate abbreviation symbols automatically.
Sense inventories
An inventory of abbreviations and acronyms from clinical texts. Sense inventories created using clinical notes and medical dictionary resources demonstrate challenges with term coverage and resource integration.
GNS / Gene Name Service
Genomics researchers suffer from getting lost in the forest of gene aliases. Gene Name Service (GNS) solves this problem by providing a comprehensive alias resolution service of all widely used gene id nomenclatures. Most importantly, in addition to web-based interface, GNS also provides services through Web Service, which can be integrated into applications such as bioinformatics value-added databases, analysis pipelines or workflows.
A database of models of the genome of Mycobacterium tuberculosis (Mtb). The CHOPIN database assigns structural domains and generates homology models for 2911 sequences, corresponding to approximately 73% of the proteome. A sophisticated pipeline allows multiple models to be created using conformational states characteristic of different oligomeric states and ligand binding, such that the models reflect various functional states of the proteins. Additionally, CHOPIN includes structural analyses of mutations potentially associated with drug resistance.
A curated database of phosphorylation sites in prokaryotes for 96 prokaryotic organisms, which belong to 11 phyla in two domains including bacteria and archaea. All the phosphorylation sites were annotated with original references and other descriptions in the database, which could be easily accessed through user-friendly website interface including various search and browse options. The dbPSP database provides a comprehensive data resource for further studies of protein phosphorylation in prokaryotes.
3DGD / 3D Genome Database
A database that currently collected Hi-C data on four species, for easy accessing and visualization of chromatin 3D structure data. With the integration of other omics data such as genome-wide protein-DNA-binding data, this data source would be useful for researchers interested in chromatin structure and its biological functions.
A general repository for chromatin interaction data. Records in 4DGenome are compiled through comprehensive literature curation of experimentally-derived and computationally-predicted interactions. The current release contains 4,433,071 experimentally-derived and 3,605,176 computationally-predicted interactions in 5 organisms. Experimental data cover both high throughput datasets and individiual focused studies. All interaction data are freely available in a standardized file format. Records can be queried by genomic regions, gene names, organism, and detection technology.
Enables simultaneous comparisons between a wide range of data by combining major resources from human and vertebrate model organisms. Manteia performs several types of analyses as well as data retrieval, gene or probe set annotation, information content analysis or candidate gene prediction and prioritization. It aims to help in investigating the genetic origin of human diseases or identifying significant correlations in lists of genes and proteins generated by modern high-throughput techniques.
A curated database of human, mouse and rat miRNAs/mRNAs targets. miRGate is designed to analyze miRNA and gene isoforms lists under a common and consistent space of annotations. Including all existing 3 UTR and the entirely known miRNAs. All Havana biotypes and ENCODE principal isoforms for the three organisms are also included.
A freely accessible web application and database that enables human mitochondrial genome researchers to study genetic variation in mitochondrial genome with textual and graphical views accompanied by assignment function of haplogrouping if users submit their own data. Hence, the MitoVariome containing many kinds of variation features in the human mitochondrial genome will be useful for understanding mitochondrial variations of each individual, haplogroup, or geographical location to elucidate the history of human evolution.
STEPdb / STEP database
Contains a comprehensive characterization of subcellular localization and topology of the complete proteome of Escherichia coli. Two widely used E. coli proteomes (K-12 and BL21) are presented organized into thirteen subcellular classes. STEPdb exploits the wealth of genetic, proteomic, biochemical, and functional information on protein localization, secretion, and targeting in E. coli, one of the best understood model organisms. Subcellular annotations were derived from a combination of bioinformatics prediction, proteomic, biochemical, functional, topological data and extensive literature re-examination that were refined through manual curation.
FLAVIdB / Flavivirus Database
A database that combines antigenic data of flaviviruses, specialized analysis tools, and workflows for automated complex analyses focusing on applications in immunology and vaccinology. FLAVIdB represents a new generation of databases in which data and tools are integrated into a data mining infrastructures specifically designed to aid rational vaccine design by discovery of vaccine targets.
Contains over 590 complete flavivirus genome/protein sequences and information on known mutations and literature references. Each sequence has been manually annotated according to its date and place of isolation, phenotype and lethality. Internal tools are provided to rapidly determine relationships between viruses in Flavitrack and sequences provided by the user.
A portal to accessing the lineage and genotype information of influenza A viruses and a Web tool for determining lineages and genotypes of influenza A viruses. These features make FluGenome unique in its ability to automatically detect genotype differences attributable to reassortment events in influenza A virus evolution.
GISAID EpiFlu database
Provides a collection of influenza sequences containing associated metadata, both clinical and epidemiological. GISAID EpiFlu database is a resource that stores information about Influenza virus. It includes data-sharing platform through which sequence data are recommended for inclusion in seasonal and pre-pandemic vaccines. These data are available for research scientists, public and animal health officials and the pharmaceutical industry.
MPIC / Mitochondrial Protein Import Components
Provides searchable information on the protein import apparatus of plant and non-plant mitochondria. An in silico analysis was carried out, comparing the mitochondrial protein import apparatus from 24 species representing various lineages from Saccharomyces cerevisiae (yeast) and algae to Homo sapiens (human) and higher plants, including Arabidopsis thaliana (Arabidopsis), Oryza sativa (rice) and other more recently sequenced plant species. Each of these species was extensively searched and manually assembled for analysis in the MPIC DB. The database presents an interactive diagram in a user-friendly manner, allowing users to select their import component of interest.
An inventory of genes encoding mitochondrial-localized proteins and their expression across 14 mouse tissues. Using the same strategy we have now reconstructed this inventory separately for human and for mouse based on (i) improved gene transcript models, (ii) updated literature curation, including results from proteomic analyses of mitochondrial sub-compartments, (iii) improved homology mapping and (iv) updated versions of all seven original data sets. The updated human MitoCarta2.0 consists of 1158 human genes, including 918 genes in the original inventory as well as 240 additional genes. The updated mouse MitoCarta2.0 consists of 1158 genes, including 967 genes in the original inventory plus 191 additional genes. The improved MitoCarta 2.0 inventory provides a molecular framework for system-level analysis of mammalian mitochondria.
Provides a comprehensive knowledgebase for mitochondrial proteome, interactome and human diseases. MitProNet features a user-friendly graphic visualization interface to present functional analysis of linkage networks. As an up-to-date database and analysis platform, MitProNet should be particularly helpful in comprehensive studies of complicated biological mechanisms underlying mitochondrial functions and human mitochondrial diseases.
A large-scale relational database that is automatically updated to keep pace with advances in mitochondrial proteomics and is curated to assure that the designation of proteins as mitochondrial reflects gene ontology (GO) annotations supported by high-quality evidence codes. A set of postulates is proposed to help define which proteins are authentic components of mitochondria. A web interface is provided to permit members of the mitochondrial research community to suggest modifications in protein annotations or mitochondrial status.
Allows the complete Ensembl gene database to be queried using phylogenetic patterns. PhyloPat offers the possibility of querying with binary phylogenetic patterns or regular expressions, or through a phylogenetic tree of the 39 included species. Users can also input a list of Ensembl, EMBL, EntrezGene or HGNC IDs to check which phylogenetic lineage any gene belongs to.
A useful resource platform, providing all basic features of a sequence database with the addition of unique analysis tools which could be valuable for the Vibrio research community. VibrioBase currently houses a total of 252 Vibrio genomes developed in a user-friendly manner and useful to enable the analysis of these genomic data, particularly in the field of comparative genomics. Besides general data browsing features, VibrioBase offers analysis tools such as BLAST interfaces and JBrowse genome browser.
SpPress / Drosophila Spermatogenesis Expression Database
A public database containing genome-wide expression analysis of wild-type males using three cell populations isolated from mitotic, meiotic and post-meiotic phases of spermatogenesis in D. melanogaster.
A database providing candidate genes for reproductive researches in pig by mining and processing existing biological literatures in human and pigs. Based on text-mining and comparative genomics, ReCGiP presents diverse information of reproduction-relevant genes in human and pig. The genes were sorted by the degree of relevance with the reproduction topics and were visualized in a gene's co-occurrence network where two genes were connected if they were co-cited in a PubMed abstract.
OKdb / Ovarian Kaleidoscope Database
Provides information regarding the biological function, expression pattern and regulation of genes expressed in the ovary. OKdb also contains information on gene sequences, chromosomal localization, human and murine mutation phenotypes and biomedical publication links.
A publicly available web-based SAGE database on male gonad development that covers six male mouse embryonic gonad stages, including E10.5, E11.5, E12.5, E13.5, E15.5 and E17.5. The sequence coverage of each SAGE library is beyond 150K, 'which is the most extensive sequence-based male gonadal transcriptome to date'. An interactive web interface with customizable parameters is provided for analyzing male gonad transcriptome information.
Provides a comprehensive platform to gather detailed information of experimentally verified and Greed AUC Stepwise (GAS)-predicted genes in spermatogenesis. SpermatogenesisOnline integrates the detailed information for 1666 genes that have been reported to be involved in spermatogenesis and 762 genes predicted by our GAS model (GAS probability >0.5) to participate in spermatogenesis. SpermatogenesisOnline 1.0 will help researchers to obtain a comprehensive understanding of complex biological mechanisms of spermatogenesis.
A resource that maps small molecule bioactivities to protein domains from the Pfam-A collection of protein families. Small molecule bioactivities mapped to protein domains add important precision to approaches that use protein sequence searches alignments to assist applications in computational drug discovery and systems and chemical biology.
An open database which allows participants of Direct-To-Consumer genetic testing to publish their genetic data at no cost along with phenotypic information. Through this crowdsourced effort of collecting genetic and phenotypic information, openSNP has become a resource for a wide area of studies, including Genome-Wide Association Studies.
A wiki resource of the functional consequences of human genetic variation as published in peer-reviewed studies. Online since 2006 and freely available for personal use, SNPedia has focused on the medical, phenotypic and genealogical associations of single nucleotide polymorphisms. Entries are formatted to allow associations to be assigned to single genotypes as well as sets of genotypes (genosets).
PGP / Personal Genome Project
A charitable organization working to generate, aggregate and interpret human biological and trait data on an unprecedented scale. Open data is a critical component of the scientific method, but genomes are both identifiable and predictive. As a result, many studies choose to withhold data from participants and restrict access to researchers. The PGP's public data is a common ground to collaborate and improve our understanding of genomes.
Compiles data on experimentally validated, naturally occurring transcription factors binding sites (TFBS) across the Bacteria domain, placing a strong emphasis on the transparency of the curation process, the quality and availability of the stored data and fully customizable access to its records. CollecTF integrates multiple sources of data automatically and openly, allowing users to dynamically redefine binding motifs and their experimental support base.
A metazoan transcription factor and maternal factor resource specially designed for developmental biology studies. Using this web interface, users can browse, search, and download detailed information on species of interest, genes, transcription factor families, or developmental ontology terms.
A database and resource of protein families in Arthropod genomes. ProtoBug platform presents the relatedness of complete proteomes from 17 insects as well as a proteome of the crustacean, Daphnia pulex. The represented proteomes from insects include louse, bee, beetle, ants, flies and mosquitoes.
CPD / Cellular Phenotype Database
A repository for data derived from high-throughput systems microscopy studies. The aims of this resource are: (i) to provide easy access to cellular phenotype and molecular localization data for the broader research community; (ii) to facilitate integration of independent phenotypic studies by means of data aggregation techniques, including use of an ontology; and (iii) to facilitate development of analytical methods in this field.
Provides centralized local storage and access to completed archaeal and bacterial genomes. MicrobeDB creates a simple to use, easy to maintain, centralized local resource for various large-scale comparative genomic analyses and a back-end for future microbial application design.
Follicle Online
A web-based database system for storing and retrieving folliculogenesis-related experimental data. It provides detailed information for 580 genes/proteins (from 23 model organisms, including Homo sapiens, Mus musculus, Rattus norvegicus, Mesocricetus auratus, Bos Taurus, Drosophila and Xenopus laevis) that have been reported to be involved in folliculogenesis, POF (premature ovarian failure) and PCOS (polycystic ovary syndrome).
A comprehensive resource of neuropeptides, which holds 5949 non-redundant neuropeptide entries originating from 493 organisms belonging to 65 neuropeptide families. In NeuroPep, the number of neuropeptides in invertebrates and vertebrates is 3455 and 2406, respectively. It is currently the most complete neuropeptide database. In addition, user-friendly web tools like browsing, sequence alignment and mapping are also integrated into the NeuroPep database.
An endogenous peptide database to aid mass spectrometric identifications. In the identification process the experimental peptide masses are compared with the peptide masses stored in SwePep both with and without possible post-translational modifications. This intermediate identification step is fast and singles out peptides that are potential endogenous peptides and can later be confirmed with tandem mass spectrometry data.
Catalogs information on the sequence, structure, active site and genomic neighborhood of experimentally characterized enzymes involved in five novel PTMs, namely AMPylation, Eliminylation, Sulfation, Hydroxylation and Deamidation. The novPTMenzy database is a unique resource that can aid in discovery of unusual PTM catalyzing enzymes in newly sequenced genomes.
Consists of a library of biological parts from the database of plasmid features. GenoLIB was designed using the synthetic biology open language (SBOL), an emerging standard developed to organize libraries of genetic parts to facilitate synthetic biology workflows. This database supports unambiguous annotation of plasmid sequences. It is indexed with a combination of automatic and manual curation methods for the determination of feature sequences, borders and functional descriptions.
A multi-species database to disentangle the SNP chip jungle. Features of SNPchiMp include, but are not limited to, the following functions: 1) referencing the SNP mapping information to the latest genome assembly, 2) extraction of information contained in dbSNP for SNPs present in all commercially available bovine chips, and 3) identification of SNPs in common between two or more bovine chips (e.g. for SNP imputation from lower to higher density). This platform allows easy integration and standardization, and it is aimed at both industry and research. It also enables users to easily link the information available from the array producer with data in public databases, without the need of additional bioinformatics tools or pipelines.
Wheat microRNA Portal
Provides a broad repertoire of hexaploid wheat miRNAs associated with abiotic stress responses, tolerance and development. These valuable resources of expressed wheat miRNAs will help in elucidating the regulatory mechanisms involved in freezing and aluminum responses and tolerance mechanisms as well as for development and flowering.
LigASite / LIGand Attachment SITE
A gold-standard dataset of binding sites in 550 proteins of known structures. LigASite consists exclusively of biologically relevant binding sites in proteins for which at least one apo- and one holo-structure are available. The website interface allows users to search the dataset by PDB identifiers, ligand identifiers, protein names or sequence, and to look for structural matches as defined by the CATH homologous superfamilies. The datasets can be downloaded from the website as Schema-validated XML files or comma-separated flat files.
HTT-DB / Horizontal Transferred of Transposable elements Database
Allows easy access to all known cases of horizontal transfer of transposable elements (HTT) reported along with rich information about each case. Moreover, it allows the user to generate tables and graphs based on searches using TEs and/or host species classification and export them in several formats.
DBAASP / Database of Antimicrobial Activity and Structure of Peptides
A manually curated database for those peptides for which antimicrobial activity against particular targets has been evaluated experimentally. The database is a depository of complete information on: the chemical structure of peptides; target species; target object of cell; peptide antimicrobial/haemolytic/cytotoxic activities; and experimental conditions at which activities were estimated. The DBAASP search page allows the user to search peptides according to their structural characteristics, complexity type (monomer, dimer and two-peptide), source, synthesis type (ribosomal, nonribosomal and synthetic) and target species. The database prediction algorithm provides a tool for rational design of new antimicrobial peptides.
The Functional lncRNA Database
A repository of mammalian long non-protein-coding transcripts that have been experimentally shown to be both non-coding and functional. To search for a specific lncRNA, enter its name and choose the appropriate species. Alternatively, you can browse all the lncRNAs at once. Currently the database contains lncRNAs from Human, Mouse and Rat.
A genome-wide transcription atlas of miRNAs in grapevine, analyzing the spatio-temporal distribution of known and newly discovered miRNAs, in the widest range of grapevine samples considered thus far. miRVine aims at becoming the reference for the future development of targeted functional studies, a first indispensable step towards the definition of miRNA involvement in grapevine development.
An integrative database of Arabidopsis thaliana miRNAs and their target genes, expression profiles, function annotations and pathways. A friendly web interface is developed to browse and analyze of the data. We believe that miRFANs is a useful platform for exploring the regulatory functions of Arabidopsis thaliana miRNAs and can provide considerable value for many researchers.
Provides expert-curated molecular interactions between successful and potential drugs and their targets in the human genome. The information in the database is presented at two levels: the initial view or landing pages for each target family provide expert-curated overviews of the key properties and selective ligands and tool compounds available. For selected targets more detailed introductory chapters for each family are available along with curated information on the pharmacological, physiological, structural, genetic and pathophysiogical properties of each target. The database is enhanced with hyperlinks to additional information in other databases including Ensembl, UniProt, PubChem, ChEMBL and DrugBank, as well as curated chemical information and literature citations in PubMed.
PDSP Ki database
A unique resource in the public domain which provides information on the abilities of drugs to interact with an expanding number of molecular targets. The Ki database serves as a data warehouse for published and internally-derived Ki, or affinity, values for a large number of drugs and drug candidates at an expanding number of G-protein coupled receptors, ion channels, transporters and enzymes.
Contains 3D structural models of 1,026 putative G protein-coupled receptors (GPCRs) in the human genome generated by the GPCR-I-TASSER pipeline. In GPCR-I-TASSER, the GPCR sequences are first threaded through the GPCR template library to identify muliple structure templates by the LOMETS programs. When close homolgous templates are identified, full-length models will be constructed by the I-TASSER based fragment assembly simulations, assisted by a GPCR and membrane specific force field and spatial restraints collected from mutagenesis experiments in GPCR-RD.
A database for experimentally solved GPCR structures. GPCR-EXP is a manually curated database that contains all G protein-coupled receptors that have been solved so far. The database is updated weekly. Each entry contains information of PDB ID, resolution, release date, biological name and literature associated with the GPCR.
A database for experimental restaints of GPCRs. GPCRRD is designed to systematically collect all experimental restraints (including residue orientation, contact and distance maps) available from the literature and primary GPCR resources using an automated text mining algorithm combined with manual validation, with the purpose of assisting GPCR 3D structure modeling and function annotation. The current dataset contains thousands of spatial restraints from mutagenesis, disulfide mapping distances, electron cryo-microscopy and Fourier-transform infrared spectroscopy experiments.
GLASS / GPCR-Ligand Association
Aims to provide a comprehensive, manually-curated resource for experimentally validated GPCR-ligand associations. A new text-mining algorithm was proposed to collect GPCR-ligand interactions from the biomedical literature, which is then crosschecked with five primary pharmacological datasets, to enhance the coverage and accuracy of GPCR-ligand association data identifications. A special architecture has been designed to allow users for making homologous ligand search with flexible bioactivity parameters. The current database contains approximately 500,000 unique entries, of which the vast majority stems from ligand associations with rhodopsin- and secretin-like receptors.
GPCR-OKB / GPCR-Oligomerization Knowledge Base
A system that supports browsing and searching for GPCR oligomer data. Such data were manually derived from the literature. While focused on GPCR oligomers, GPCR-OKB is seamlessly connected to GPCRDB, facilitating the correlation of information about GPCR protomers and oligomers.
A database which currently holds information about 713 human GPCRs, 36 human G-proteins and 99 human effectors. The collection of information about the interactions between these molecules was done manually and the current version of Human-gpDB holds information for about 1663 connections between GPCRs and G-proteins and 1618 connections between G-proteins and effectors. Major advantages of Human-gpDB are the integration of several external data sources and the support of advanced visualization techniques.
WGE / Wellcome Trust Sanger Institute Genome Editing database
Uses methods to compute, visualize and select optimal CRISPR sites in a genome browser environment. The WGE database currently stores single and paired CRISPR sites and pre-calculated off-target information for CRISPRs located in the mouse and human exomes. Scoring and display of off-target sites is simple, and intuitive, and filters can be applied to identify high-quality CRISPR sites rapidly. WGE also provides a tool for the design and display of gene targeting vectors in the same genome browser, along with gene models, protein translation and variation tracks.
GPCR NaVa database
Integrates data on natural variants in human GPCRs from online databases, the scientific literature, and patents. Where available, variants contain information on their location in the DNA (and protein sequence), the involved nucleotides (and amino acids), the average frequency of each allele, reported disease associations, and references to public databases and the scientific literature. The GPCR NaVa database aims to facilitate studies into pharmacogenetics, genotype-phenotype, and structure-function relationships of GPCRs.
Provides the first comprehensive web-based and open-access lncRNA catalogue for three key male germ cell stages, including type A spermatogonia, pachytene spermatocytes and round spermatids. This information has been developed by integrating male germ transcriptome resources derived from RNA-Seq, tiling microarray and GermSAGE. Characterizations on lncRNA-associated regulatory features, potential coding gene and microRNA targets are also provided.
GraP / Platform of Functional genomics analysis in Gossypium raimondii
Provides an integration, multi-dimensional analysis and visualization platform for cotton functional genomics research. GraP includes updated functional annotation, gene family classifications, protein–protein interaction networks, co-expression networks and microRNA–target pairs. Moreover, gene set enrichment analysis and cis-element significance analysis tools are also provided for gene batch analysis of high-throughput data sets.
A manually curated compilation of molecularly characterized genes that are involved in drought stress response. DroughtDB includes information about the originally identified gene, its physiological and/or molecular function and mutant phenotypes and provides detailed information about computed orthologous genes in nine model and crop plant species including maize and barley. All identified orthologs are interlinked with the respective reference entry in MIPS/PGSB PlantsDB, which allows retrieval of additional information like genome context and sequence information. Thus, DroughtDB is a valuable resource and information tool for researchers working on drought stress and will facilitate the identification, analysis and characterization of genes involved in drought stress tolerance in agriculturally important crop plants.
CPTAC Data Portal
The Clinical Proteomic Tumor Analysis Consortium (CPTAC) analyzes cancer biospecimens by mass spectrometry, characterizing and quantifying their constituent proteins, or proteome. The CPTAC Data Portal is the centralized repository for the dissemination of proteomic data collected by the Proteome Characterization Centers (PCCs) for the CPTAC program. The portal also hosts analyses of the mass spectrometry data (mapping of spectra to peptide sequences and protein identification) from the PCCs and from a CPTAC-sponsored common data analysis pipeline (CDAP).
LIP / Loops In Proteins
Includes all protein segments of a length up to 15 residues contained in the Protein Data Bank (PDB). Searching the database for loop candidates takes less than 1 s on a desktop PC, and ranking them takes a few minutes.
A structural classification of loops extracted from known protein structures. The structural classification is based on the geometry and conformation of the loop. The geometry is defined by four internal variables and the type of regular flanking secondary structures, resulting in 10 different loop types. The new version of ArchDB features a novel, fast and user-friendly web-based interface, and a novel graph-based, computationally efficient, clustering algorithm. The current version of ArchDB classifies 149,134 loops in 5739 classes and 9608 subclasses.
SPIKE / Signaling Pathways Integrated Knowledge Engine
A database of highly curated human signaling pathways with an associated interactive software tool. Users can view and download individual pathway maps and browse the entire database from this website, or launch a map viewer tool that allows dynamic visualization of the database and save networks in XGMML format that can be viewed in all generic XGMML viewers.
Serves as an openly accessible database for the deposition of structural ensembles of intrinsically disordered proteins (IDPs) and of denatured proteins based on nuclear magnetic resonance spectroscopy, small-angle X-ray scattering and other data measured in solution. PE-DB is open for submissions from the community, and is intended as a forum for disseminating the structural ensembles and the methodologies used to generate them. While the need to represent the IDP structures is clear, methods for determining and evaluating the structural ensembles are still evolving. The availability of the pE-DB database is expected to promote the development of new modeling methods and leads to a better understanding of how function arises from disordered states.
A public database for force-field parameters with a special emphasis on lipids, detergents and similar molecules that are of interest when simulating biological membrane systems. Lipidbook stores parameter files that are supplied by the community. Topologies, parameters and lipid or whole bilayer structures can be deposited in any format for any simulation code, preferably under a license that promotes "open knowledge."
The database contained curated publications about positive selection in different human populations, which consisted of over 15,000 loci from either publications attempting to study positively selected genomic locus and gene related to specific functions/traits/diseases, or publications to detect the genome-wide selective signals with different statistical methods.
[email protected]
A public database of ethnically variant single-nucleotide polymorphisms (ESNPs) with annotation information that currently contains 100 736 ESNPs from 10 138 genes. The [email protected] database can be queried using gene symbols, RefSeq mRNA IDs, dbSNP rs numbers and lists containing multiple genes.
[email protected]
An integrative and hierarchical database focusing on positive selection of human genome. [email protected] is a valuable and useful resource for finding and verifying signals of natural selection. It provides two user interfaces, data query and data visualization.
DSigDB / Drug Signatures Database
A collection of drug and small molecule related gene sets based on quantitative inhibition and/or drug-induced gene expression changes data. DSigDB allows users to search, view, and download drugs/compounds and gene sets. DSigDB gene sets provide seamless integration to GSEA software for linking gene expressions with drugs/compounds for drug repurposing and translational research.
A knowledge resource for lipids and their biology. SwissLipids provides curated knowledge of lipid structures and metabolism which is used to generate an in silico library of feasible lipid structures. These are arranged in a hierarchical classification that links mass spectrometry analytical outputs to all possible lipid structures, metabolic reactions, and enzymes. SwissLipids provides a reference namespace for lipidomic data publication, data exploration and hypothesis generation.
ESCAPE / Embryonic Stem Cell Atlas from Pluripotency Evidence
A mouse and human embryonic stem cells (m/hESC)-centered database integrating data from many recent diverse high-throughput studies including chromatin immunoprecipitation followed by deep sequencing, genome-wide inhibitory RNA screens, gene expression microarrays or RNA-seq after knockdown (KD) or overexpression of critical factors, immunoprecipitation followed by mass spectrometry proteomics and phosphoproteomics. The database provides web-based interactive search and visualization tools that can be used to build subnetworks and to identify known and novel regulatory interactions across various regulatory layers. The web-interface also includes tools to predict the effects of combinatorial KDs by additive effects controlled by sliders, or through simulation software implemented in MATLAB.
A database for human phosphovariants. PhosphoVariant can be used in pathophysiological studies of mutations and in the selection of polymorphisms of clinical and phenotypical importance. The screening and prediction of phosphovariants can be a starting point for further research.
CEG / Cluster of Essential Genes
Contains clusters of orthologous essential genes. Based on the size of a cluster, users can easily decide whether an essential gene is conserved in multiple bacterial species or is species-specific. CEG contains the similarity value of every essential gene cluster against human proteins or genes. Properties contained in the CEG database, such as cluster size, and the similarity of essential gene clusters against human proteins or genes, are very important for evolutionary research and drug design.
DEG / Database of Essential Genes
Essential genes are those indispensable for the survival of an organism, and therefore are considered a foundation of life. DEG hosts records of currently available essential genomic elements, such as protein-coding genes and non-coding RNAs, among bacteria, archaea and eukaryotes. Essential genes in a bacterium constitute a minimal genome, forming a set of functional modules, which play key roles in the emerging field, synthetic biology.
IFIM / Integrates quantitative Fitness Information for Microbial genes
Provides integrated microbial fitness data from both experiments and computational simulations. The integrated fitness data in IFIM originate from experiments of single-gene deletion mutants, libraries of transposon integrations and computational simulations using Geptop. IFIM overlaps with the existing DEG, OGEE and CEG databases, but differs from them in that it provides quantitative fitness values for the genes.
OGEE / Online GEne Essentiality database
Its main purpose is to enhance our understanding of the essentiality of genes. This is achieved by collecting not only experimentally tested essential and non-essential genes, but also associated gene features such as expression profiles, duplication status, conservation across species, evolutionary origins and involvement in embryonic development.
A central repository for plasmid clones and collections. DNASU focuses both on storing valuable plasmids that researchers and consortiums have created and on distributing these highly-annotated plasmids and plasmid collections to researchers worldwide. The goal is for researchers to avoid the time and effort needed to reclone genes so that they can focus immediately on their experiments, thus accelerating the pace of discovery.
PlasmID / Plasmid Information Database
A community-based resource portal to facilitate search and request of plasmid clones shared with the Dana-Farber/Harvard Cancer Center (DF/HCC) DNA Resource Core. PlasmID serves as a central data repository and enables researchers to search the collection online using common gene names and identifiers, keywords, vector features, author names and PubMed IDs.
BEI Resources / Biological and Emerging Infections Resources
A centralized biological resource center (BRC) for research reagents to the scientific community. The primary role of BEI Resources is the acquisition and authentication of several categories of materials for registered scientists, with a focus on emerging and re-emerging infectious diseases. A major goal of this BRC is to provide reference standards to scientists carrying out basic research to develop improved diagnostic tests, vaccines, and therapies.
SGDB / Synthetic Gene Database
A relational database that houses sequences and associated experimental information on synthetic (artificially engineered) genes from all peer-reviewed studies published to date. At present, the database comprises information from more than 200 published experiments. SGDB not only provides reference material to guide experimentalists in designing new genes that improve protein expression, but also offers a dataset for analysis by bioinformaticians who seek to test ideas regarding the underlying factors that influence gene expression.
Codon Usage Database
Contains codon usage frequencies for 3,027,973 complete protein coding genes in 35,799 organisms. The use of the database is facilitated by keyword based search analysis and the availability of codon usage tables for selected genes from each species. These tools provide users with the ability to further analyze for variations in codon usage among different genomes.
A database of biological components for synthetic biology. CompuBioTicDB stores biological molecules, such as proteins and metabolites, as well as devices such as sensors, switches or timekeepers, which use the molecules as basic parts.
DOQCS / Database Of Quantitative Cellular Signaling
A repository of models of signaling pathways. DOQCS is intended both to serve the growing field of chemical-reaction level simulation of signaling networks, and to anticipate issues in large-scale data management for signaling chemistry.
CellML Model Repository
Provides free access to biological models. The vast majority of these models are derived from published, peer-reviewed papers. Model curation is an important and ongoing process to ensure the CellML model is able to accurately reproduce the published results. As the CellML community grows, and more people add their models to the repository, model annotation will become increasingly important to facilitate data searches and information retrieval.
MACPAK / Macrophage Pathway Knowledgebase
A computational system that allows biomedical researchers to query and study the dynamic behaviors of macrophage molecular pathways. MACPAK integrates the knowledge of 230 reviews that were carefully checked by specialists for their accuracy and then converted to 230 dynamic mathematical pathway models.
PsyGeNET / Psychiatric disorders and Genes association NETwork
Constitutes a resource on psychiatric diseases and their associated genes. PsyGeNET consists in a database and analysis tools. It contains information on depression, bipolar disorder, alcohol use disorders and cocaine. The database was developed by applying text mining tools to extract information from the scientific literature. The document describing the curation guidelines, providing a resource for the development and evaluation of text mining systems, is available in the web portal.
Deals with cellular processes into Mycoplasma genitalium. WholeCellKB consists of a repository of whole-cell models which mainly focuses on assisting users in performs whole-cell simulations. Information is organized according to more than 10 sections including specifications about its subcellular organization, the binding sites and footprint of every DNA-binding protein and the organization and promoter of each transcription unit.
Ebola-KB / Ebola virus-centered Knowledge Base
An Ebola virus-centered Knowledge Base using linked data and semantic web technologies. Ebola-KB aggregates knowledge from several open data sources, web services and biomedical ontologies. This knowledge is transformed to RDF, linked to the Bio2RDF datasets and made available through a SPARQL 1.1 endpoint. Ebola-KB can also be explored using an interactive dashboard visualizing the different perspectives of this integrated knowledge.
Integrates information related to metabolic pathways. kpath also provides a navigational interface that enables not only the browsing, but also the deep use of the integrated data to build metabolic networks based on existing disperse knowledge. This user interface has been used to showcase relationships that can be inferred from the information available in several public databases.
A web-based genomics platform for histone-modifying enzymes (HMEs) by using HMM sequence profiles. The dbHiMo provides users with web-based personalized data browsing and analysis tools, supporting comparative and evolutionary genomics. With comprehensive data entries and associated web-based tools, dbHiMo will be a valuable resource for future epigenetics/epigenomics studies.
Provides easy access to a total of 3,103 known regulations in C. glutamicum ATCC 13032 and M. tuberculosis H37Rv and to 38,940 evolutionary conserved interactions for 18 non-model species of the CMNR group. This makes CMRegNet to date the most comprehensive database of regulatory interactions of CMNR bacteria. CMRegNet is accessible by a user-friendly online interface.
A unified global portal for deposition and retrieval of 3DEM density maps, atomic models, and associated metadata, as well as a resource for news, events, software tools, data standards, validation methods for the 3DEM community. EMDataBank unifies public access to the two major archives containing EM-based structural data: EM Data Bank (EMDB) and Protein Data Bank (PDB), and facilitates use of EM structural data of macromolecules and macromolecular complexes by the wider scientific community.
GSR / Genetic Simulation Resources
A website provided by the National Cancer Institute (NCI) that aims to help researchers compare and choose the appropriate simulation tools for their studies. This website allows authors of simulation software to register their applications and describe them with well-defined attributes, thus allowing site users to search and compare simulators according to specified features.
HPID / Human Protein Interaction Database
It was designed (1) to provide human protein interaction information pre-computed from existing structural and experimental data, (2) to predict potential interactions between proteins submitted by users and (3) to provide a depository for new human protein interaction data from users. Two types of interaction are available from the pre-computed data: (1) interactions at the protein superfamily level and (2) those transferred from the interactions of yeast proteins.
MiDAS / Microbial Database for Activated Sludge
Gathers information about the morphology, ecophysiology, abundance and distribution of genus members in full-scale treatment systems with phylotype identity. MiDAS was primarily a taxonomic database curated for abundant and process important phylotypes for activated sludge wastewater treatment systems with biological nutrient removal. The repository also includes the organisms of the anaerobic digestion community and the most abundant influent wastewater organisms.
HCSD / Human Cancer Secretome Database
A comprehensive database for human cancer secretome data. The cancer secretome describes proteins secreted by cancer cells and structuring information about the cancer secretome will enable further analysis of how this is related with tumor biology. The results are visualized in an explicit and interactive manner. An example of a result page includes annotations, cross references, cancer secretome data and secretory features for each identified protein.
SInCRe / Structural Interactome Computational Resource
An integrated database for Mycobacterium tuberculosis H37Rv (Mtb) that collates information on protein sequences, domain assignments, functional annotation and 3D structural information along with protein-protein and protein-small molecule interactions. The motivation for development of this database is to provide an integrated platform to allow easily access and interpretation of data and results obtained by all the groups in CamBan in the field of Mtb informatics.
Automates the detection of mutations and the extraction of mutation–gene pairs. The result is a database of such pairs. MEMA identified 24 351 singleton mutations in conjunction with a HUGO gene name out of 16 728 abstracts.