Resources Analytics Protocols arrow_drop_down
COSMIC / Catalogue Of Somatic Mutations In Cancer
Enables to explore the impact of somatic mutations in human cancer. COSMIC is a database system that collects these somatic mutation data from a variety of public sources into one standardized repository, and make it easily explorable in a variety of graphical, tabulated and downloadable ways. The database encompasses all forms of human cancer. COSMIC is built primarily via curation of published literature by expert scientists. The database is also available for download in multiple formats.
A microsatellite database of commercially important fishes and shellfishes of the Indian subcontinent.
dbNSFP / database for nonsynonymous SNPs’ functional predictions
Eases the process of filtering and prioritizing the presumably functional single nucleotide variants (SNVs) from a long list of SNVs identified in a typical whole exome sequencing (WES) study. dbNSFP can work as a local and self-sustaining database without need for internet connection. The database provides more than 82 800 000 non-synonymous SNVs (nsSNVs) and splice site SNVs (ssSNVs).
A database and a web interface that is designed to fill the annotation gap left by the high cost of experimental testing for functional significance of protein variants.
Provides information about phenotyping human single nucleotide polymorphisms (SNPs). SNPeffect is a database that allows users to search SNVs by filtering on molecular phenotypic effects, mutation type, disease, UniProt identifier, dbSNP identifier and gene name. Users can also analyze and plot phenotypic features of a specific subset (or all) of the SNPeffect database. It also includes a data submission framework for submitting (human or non-human) custom single protein variants for a detailed SNPeffect analysis including TANGO, WALTZ, LIMBO and FoldX.
This site produces an interactive visualization of disease and non-disease associated non-synonymous single nucleotide polymorphisms (nsSNPs) and displays geometric and relative entropy calculations.
Felsenstein’s website
Phylogeny programs page describing all known software for inferring phylogenies (evolutionary trees).
IGSR / International Genome Sample Resource
Expands in data type and population diversity the resources from the 1000 Genomes Project. IGSR represents the largest open collection of human variation data and provides easy access to these resources. IGSR was established in 2015 to maintain and extend the 1000 Genomes Project data, which has been widely used as a reference set of human variation and by researchers developing analysis methods. IGSR has mapped all of the 1000 Genomes sequence to the newest human reference (GRCh38), and will release updated variant calls to ensure maximal usefulness of the existing data. IGSR is collecting new structural variation data on the 1000 Genomes samples from long read sequencing and other technologies, and will collect relevant functional data into a single comprehensive resource. IGSR is extending coverage with new populations sequenced by collaborating groups.
Provides germline and somatic variants of any size, type or genomic location. ClinVar gathers interpretations of clinical significance of variants for reported conditions. It accepts submissions from clinical testing labs, researchers, locus-specific databases, other databases, expert panels and groups establishing professional guidelines from all countries. This database offers general and advanced query interfaces.
dbGaP / database of Genotypes and Phenotypes
Archives and distributes the results of genotype-phenotype studies. dbGaP provides genomic data from cohort studies, clinical trials and other studies. This method is a highly utilized application for sharing individual-level data and summary level data such as allele frequencies. Data includes genotype, phenotype, exposure, expression array, epigenomic and pedigree data from genome-wide association studies (GWAS), sequencing studies and other largescale genomic studies.
dbSNP / database of Short Genetic Variations
Provides a public repository for genetic variation. dbSNP includes disease-causing clinical mutations as well as neutral polymorphisms. This method links variations (polymorphisms and clinical mutations) to NCBI sequence resources via BLAST and E-PCR analysis. It also facilitates searches along five major axes of information: (i) sequence location, (ii) function, (iii) cross-species homology, (iv) single nucleotide polymorphisms (SNPs) quality or validation status and (v) degree of heterozygosity (degree of population variation).
DECIPHER / DatabasE of Genomic variants and Phenotype in Humans Using Ensembl Resources
DGV / Database of Genomic Variants
Provides a comprehensive summary of structural variation in the human genome. We define structural variation as genomic alterations that involve segments of DNA that are larger than 50bp. The content of the database is only representing structural variation identified in healthy control samples. The Database of Genomic Variants provides a useful catalog of control data for studies aiming to correlate genomic variation with phenotypic data. The database is continuously updated with new data from peer reviewed research studies.
Collects peer-reviewed inherited disease descriptions written by experts with editing. GeneReviews contains phenotypic information and information on selected variants, and its strength is in the clinical summaries it offers. The entries follow a standard format and focus on the clinical aspects of the disease including diagnosis, management and counseling. Unlike the Online Mendelian Inheritance in Man (OMIM) format, GeneReviews reads like a medical textbook.
GTEx / Genotype-Tissue Expression
Provides a resource that allows investigations of the relationship among genetic variation, gene expression, and other molecular phenotypes in multiple human tissues. GTEx gives a unified view of genetic effects on gene expression across a broad range of tissue types, most of which have not been studied for expression quantitative trait locus (eQTLs) previously. This dataset was constructed on the basis of a collection of 900 donors.
International HapMap Project
A multi-country effort to identify and catalog genetic similarities and differences in human beings. Using the information in the HapMap, researchers will be able to find genes that affect health, disease, and individual responses to medications and environmental factors. The goal of the International HapMap Project is to compare the genetic sequences of different individuals to identify chromosomal regions where genetic variants are shared. By making this information freely available, the Project will help biomedical researchers find genes involved in disease and responses to therapeutic drugs.
LSDBs / Locus-Specific DataBases
Links towards databases dealing with the information relative to gene sequence variation associated with human phenotypes. LSDBs aims to propose a registry as exhaustive as possible of online resources available. Each recorded repository includes data such as a direct URL for accessing the targeted repository, curators name, the possible linked diseases, and the acceptance of submission by external users as well as the last update date.
LOVD / Leiden Open Variation Database
Provide a flexible, freely available tool for Gene-centered collection and display of DNA variations. LOVD 3.0 extends this idea to also provide patient-centered data storage and storage of NGS data, even of variants outside of genes. LOVD allows users to link large numbers of DNA variants in one or more genes to an individual (multi-gene disorders or large scale next-generation sequencing). You can even use LOVD on your personal computer to browse through the variants in your own exome/genome. To maintain a high quality of the data stored, LOVD connects with various resources, like HGNC, NCBI, EBI and Mutalyzer.
Exome Variant Server
The NHLBI Exome Sequencing Project (ESP) is focused on understanding the contribution of rare genetic variation to heart, lung and blood disorders through the sequencing of well-phenotyped populations. Variant count data are available on the Exome Variant Server, which currently contains exome sequence data on 6503 individuals, and allele frequencies are provided for African-Americans and European-Americans.
OMIM / Online Mendelian Inheritance in Man
Offers an online catalog of human genes and genetic disorders. OMIM integrates genomic coordinate searches of the gene map, views of genetic heterogeneity of phenotypes, and side-by-side comparisons of clinical synopses. It focuses on the molecular relationship between genetic variation and phenotypic expression. This database is based on the published peer-reviewed biomedical literature. It is useful for clinicians, molecular biologists and genome scientists.
PhenCode / Phenotypes for ENCODE
A collaborative, exploratory project to help understand phenotypes of human mutations in the context of sequence and functional data from genome projects. PhenCode connects human phenotype and clinical data in various locus-specific databases (LSDBs) with data on genome sequences, evolutionary history, and function from the ENCODE project and other resources in the UCSC Genome Browser.
SNP and indel Imputability
A publicly available SNP and indel imputability database, aiming to provide direct access to imputation accuracy information for markers identified by the 1000 Genomes Project across four major populations and covering multiple GWAS genotyping platforms. SNP and indel imputability information can be retrieved through a user-friendly interface by providing the ID(s) of the desired variant(s) or by specifying the desired genomic region. The query results can be refined by selecting relevant GWAS genotyping platform(s).
HGMD / Human Gene Mutation Database
Compiles information related to disease-related functional genetic variation in the germline. HGMD includes more than 200000 mutation entries, manually curated, obtained from the scientific literature. The database can be consulted in two formats: a public and a commercial version providing additional features. It can be used for supporting the pathological authenticity and/or novelty of detected gene lesions, establishing an overview of the mutational spectra for specific genes.
Aims to be the microRNA (miRNA) portal encompassing microRNA diversity, expression profiles, target relationships, and various supporting tools. By keeping datasets and analytic tools up-to-date, miRGator should continue to serve as an integrated resource for biogenesis and functional investigation of miRNAs.
A database of manually curated dSNPs on the 3’UTRs of human genes from available publications in PubMed.
PolymiRTS Database
An integrated platform for analyzing the functional impact of genetic polymorphisms in miRNA seed regions and miRNA target sites. The browse and search pages of PolymiRTS allow users to explore the relations between the PolymiRTSs and gene expression traits, physiological and behavioral phenotypes, human diseases and biological pathways.
Provides amount of predicted microRNA (miRNA) targets on the largest available set of human lncRNAs. LncBase offers a comprehensive collection of computationally predicted miRNA recognition elements (MREs) on mouse lncRNAs. It also includes miRNA–lncRNA interactions supported by experimental data for both human and mouse species. This database is composed of two modules: (i) to explore computationally predicted MREs of DIANA-microT-CDS, and (ii) to explore experimentally verified target sites.
Provides over of thousands of high quality manually curated experimentally validated miRNA:gene interactions, enhanced with detailed meta-data. DIANA-TarBase assists users to identify positive or negative experimental results, the utilized experimental methodology, experimental conditions including cell/tissue type and treatment. Its interface supplies also advanced information ranging from the binding site location, as identified experimentally as well as in silico, to the primer sequences used for cloning experiments.
MicroCosm Targets
A web resource developed by the Enright Lab at the EMBL-EBI containing computationally predicted targets for microRNAs across many species. The miRNA sequences are obtained from the miRBase Sequence database and most genomic sequence from EnsEMBL.
A comprehensive resource of microRNA target predictions and expression profiles. Target predictions are based on a development of the miRanda algorithm which incorporates current biological knowledge on target rules and on the use of an up-to-date compendium of mammalian microRNAs. MicroRNA expression profiles are derived from a comprehensive sequencing project of a large set of mammalian tissues and cell lines of normal and disease origin.
An online resource for miRNA target prediction and functional annotations. In addition to presenting precompiled prediction data, a new feature is the web server interface that allows submission of user-provided sequences for miRNA target prediction. In this way, users have the flexibility to study any custom miRNAs or target genes of interest.
Provides access to miRNA-Target predictions for Drosophila miRNAs. miRNA – Target Gene Prediction is a resource that was developed with methods that make use of genome comparison across insect species.
Collects predicted microRNA targets. PITA includes lists of predicted microRNA targets in worm, mouse, fly and human. For each organism, the top predictions or the complete list of predictions are available to download. It uses standard settings to identify initial seeds for each microRNA in 3’ untranslated transcribed region (UTR). It calculates a total interaction score for the microRNA and UTR by combining sites for the same microRNA.
A Plant MicroRNA Target Expression Database to study the microRNA (miRNA) functions by inferring their target gene expression profiles among the large amount of existing microarray data.
Enables to explore the transcriptional regulatory networks of noncoding RNAs (ncRNAs) and protein-coding genes (PCGs). ChIPBase is an open database that integrates many ChIPseq peak datasets of trans-acting factors, including transcription factors (TFs), transcription cofactors (TCFs), chromatin-remodeling factors (CRFs), other DNA-binding proteins and histone modifications. The database consists of nine web-based modules and tools.
Simplifies the studies on insulators and their roles in demarcating functional genomic domains. CTCFBSDB is an online database that includes almost 15 million experimentally determined CTCF binding sites across several species. This repository contains several features: (1) inclusion of genomic topological domains defined using Hi-C data; (2) identification of CTCF-binding sequences that overlap a given CTCF-binding sequence; (3) inclusion of occupancy data; (4) classification of motif match type; and (5) integration with Genome Browser.
A database of predicted transcription factors in completely sequenced genomes. The predicted transcription factors all contain assignments to sequence specific DNA-binding domain families. The predictions are based on domain assignments from the SUPERFAMILY and Pfam hidden Markov model libraries.
Provides a comprehensive parts list of functional elements in the human genome. ENCODE is an online resource that includes elements that act at the protein and RNA levels, and regulatory elements that control cells and circumstances in which a gene is active. This corpus of data provides an astounding resource for annotation, curation and functional characterization in the human and mouse genomes in a large variety of sample types.
A Wiki-based database for transcription factor-binding data generated by the ENCODE consortium. In the first release, factorbook contains 457 ChIP-seq datasets on 119 TFs in a number of human cell lines, the average profiles of histone modifications and nucleosome positioning around the TF-binding regions, sequence motifs enriched in the regions and the distance and orientation preferences between motif sites.
Collects information about transcription factors (TFs) in the fruit fly Drosophila melanogaster. FlyMine provides results from the curation of over 1052 candidate TFs selected for the presence of a canonical DNA-binding domain. The database allows users to search for a single gene of interest, upload extensive gene lists, from which the genes encoding TFs will be recognised and marked and make lists of TFs fulfilling specific criteria.
Integrates several open access repositories of curated cis-elements, DNA motifs, and TFs into a unique repository. footprintDB systematically annotates the binding interfaces of the TFs by exploiting protein–DNA complexes deposited in the Protein Data Bank. Each entry in footprintDB is thus a DNA motif linked to the protein sequence of the TF(s) known to recognize it, and in most cases, the set of predicted interface residues involved in specific recognition.
Assists users to retrieve and utilize chromatin immunoprecipitation (ChIP) data in public domains. hmChIP is a database that contains ChIP-chip and ChIPseq samples collected from GEO, SRA, and the ENCODE at UCSC. Users can search for available experiments by providing protein names, cell types and/or a list of genomic regions. This resource permits scientists to retrieve protein–DNA binding intensities from individual samples for user-provided genomic regions.
HOCOMOCO / HOmo sapiens COmprehensive MOdel COllection
Provides non-redundant curated binding models. HOCOMOCO is a comprehensive and carefully hand-curated collection of Transcription Factors Binding Sites (TFBS) models with reduced redundancy of model associations to individual transcription factor (TF). This website provides a system of interactive filters making it easier to browse the tables of the collection. To facilitate a practical application, all models are linked to gene and protein databases.
Offers a collection of extensively curated, non-redundant profiles collected from published collections of Transcription Factors Binding Sites (TFBS) from multicellular eukaryotes. JASPAR provides a web portal that provides a graphical interface for casual users, enabling browsing and database search functions, as well as basic sequence search functionality for selected profiles. It can also be used for seeking models for specific factors or structural classes, or if experimental evidence is paramount.
KDBI / Kinetic Data of Bio-molecular Interaction database
Provides experimentally determined kinetic data of protein–protein, protein-RNA, protein-DNA, protein-ligand, RNA-ligand, DNA-ligand binding or reaction events described in the literature. KDBI contains information about binding or reaction event, participating molecules (name, synonyms, molecular formula, classification, SWISS-PROT AC or CAS number), binding or reaction equation, kinetic data and related references.
One stop shopping experience for transcription factors and regulatory sequence annotations. It is a software framework for the construction and maintenance of regulatory sequence data annotations.
Provides experimentally determined thermodynamic interaction data between proteins and nucleic acids. ProNIT is useful for researchers wishing to study the underlying mechanisms of protein stability upon mutations and protein-nucleic acid interactions. This online resource contains about 4900 entries from more than 270 research papers. It includes proteins from a variety of organisms, majority of interaction data are from Escherichia coli protein, Mus musculus, or Homo sapiens.
Catalogs over 1,200 position weight matrices (PWMs) for 196 different yeast transcription factors.
Compiles regulatory sites annotations for 18 organisms. SwissRegulon deals with both prokaryotes and eukaryotes and furnishes information about position-specific weight matrices (WMs), promoters and predictions of transcription factor binding sites (TFBSs), curated from literature or derived from experiments. Users can explore annotations for promoters, genes or regulators of interest or download data of interest into a flat format file.
TFinDIT / Transcription Factor-DNA Interaction Data deposITory
A relational database and a web search tool for studying transcription factor-DNA interactions. The database contains annotated transcription factor-DNA complex structures and related data, such as unbound protein structures, thermodynamic data, and binding sequences for the corresponding transcription factors in the complex structures. TFinDit also provides a user-friendly interface and allows users to either query individual entries or generate datasets through culling the database based on one or more search criteria.
Provides data on eukaryotic transcription factors, their experimentally-proven binding sites, consensus binding sequences (positional weight matrices) and regulated genes. TRANSCompel contains data on eukaryotic transcription factors experimentally proven to act together in a synergistic or antagonistic manner.
UniPROBE / Universal PBM Resource for Oligonucleotide Binding Evaluation
Gathers data about in vitro DNA binding specificities of proteins. UniPROBE is a database including DNA binding data about more than 500 non-redundant proteins and complexes from various organisms generated by using universal protein-binding microarray (PBM) technology. It provides information related to k-mers, position weight matrices and graphical sequence logos. Searches can be made by text search, similar motifs or by transcription factor (TF) sites.
YEASTRACT / Yeast Search for Transcriptional Regulators And Consensus Tracking
Enables an interactive global view of genomic scale regulatory networks. YEASTRACT integrates several experimentally validated transcriptional regulatory data for published S. cerevisiae. This resource contains more than 41 000 regulatory associations based on DNA binding evidence and about 172 000 on expression evidence. It holds analysis tools for investigation of the transcriptional regulation of genes involved in a particular biological response.
AGRIS / Arabidopsis Gene Regulatory Information Server
Provides a portal dealing with Arabidopsis genes regulatory information. AGRIS is a repository composed of three mains panels (i) AtcisDB, that compiles upstream regions of annotated Arabidopsis genes; (ii) AtTFDB, that displays information on transcription factors (TFs); and (iii) GRG-X that allows users to browse among direct interactions between TFs and target genes. In addition, the platform provides links towards external resources.
Generates a map of predicted transcription factor binding sites (TFBS) and small RNA target sites for the whole Arabidopsis thaliana genome. AthaMap can be used for bioinformatic predictions of putative regulatory sites. Several online web tools are available that address specific questions. Starting with the identification of transcription factor-binding sites (TFBS) in any gene of interest, colocalizing TFBS can be identified as well as common TFBS in a set of user-provided genes. Furthermore, genes can be identified that are potentially targeted by specific transcription factors or small inhibitory RNAs.
Focuses on specific binding elements on known genes, found with experimental methods.
DATF / Database of Arabidopsis Transcription Factors
Collects all arabidopsis transcription factors (totally 1922 Loci; 2290 Gene Models) and classifies them into 64 families.
DATFAP / Database of Transcription Factors with Alignments and Primers
Assists users in all areas of plant molecular biology working with transcription factors. DATFAP is a graphical resource equipped with a search facility and specific primers for sequences. Transcription factors from plants and green algae are collected and made available together with specific primers, homology alignments and sequence phylogenies. Users gives directives to the search engine via input to four text boxes.
DPTF / Database of Poplar Transcription Factors
A plant transcription factor (TF) database containing 2576 putative poplar TFs distributed in 64 families. These TFs were identified from both computational prediction and manual curation. We have provided extensive annotations including sequence features, functional domains, GO assignment and expression evidence for all TFs. In addition, DPTF contains cross-links to the Arabidopsis and rice transcription factor databases making it a unique resource for genome-scale comparative studies of transcriptional regulation in model plants.
DRTF / Database of Rice Transcription Factors
A collection of known and predicted transcription factors of Oryza sativa L. ssp. indica and Oryza sativa L. ssp. japonica.
GRASSIUS / Grass Regulatory Information Server
Provides a public web resource composed by a collection of databases, computational and experimental resources that relate to the control of gene expression in the grasses, and their relationship with agronomic traits.
Provides access to relevant annotations of large transcription factor (TF) sets of three important legumes as well as tools for comparative genomic analyses. LegumeTFDB is an online resource enable comparative genomics of TF repositories both within legume species, among legumes, non-legume plants and other organisms. It also supplies links to either species-specific TF databases such as DATF, RARTF, AtTFDB, DRTF or integrative TF databases.
Covers motifs found in plant cis-acting regulatory DNA elements. PLACE contains some motifs in non-plant cis-elements in the hope that assist researchers in finding plant homologues. It offers functions allowing keyword search, signal scan search, or homology search in FASTA format. This database furnishes a brief definition and description of each motif, and relevant literature with PubMed ID numbers and GenBank accession numbers.
A phylogeny-based comprehensive resource of plant transcription associated proteins.
A web-based analysis tool that is designed to identify and categorize plant TF/TR/CR genes from genome-scale protein and nucleic acid sequences by systematically analyzing InterProScan domain patterns in protein sequences.
PlantTFDB / Plant Transcription Factor Database
A database of functional and evolutionary study of plant transcription factors. With the version 4.0, PlantTFDB offers 320 370 TFs from 165 species. Three types of annotation provide more directly clues to investigate functional mechanisms underlying: (i) a set of high-quality, non-redundant TF binding motifs derived from experiments, (ii) multiple types of regulatory elements identified from highthroughput sequencing data and (iii) regulatory interactions curated from literature and inferred by combining TF binding motifs and regulatory elements.
PlnTFDB / Plant Transcription Factor Database
Compiles data relative to putatively complete sets of transcription factors (TFs) and transcriptional regulators (TRs) from 19 plant species. PInTFDB is a repository with the aim of determining and recording plant genes involved in transcriptional control. It provides information about the different regulatory proteins and their classification into families, sequence alignments, as well as literature references and links to other databases.
RARTF / RIKEN Arabidopsis Transcription Factor database
Database and Tools for Complete Sets of Arabidopsis Transcription Factors.
A knowledge database for all the transcription factors in the soybean genome. SoyDB contains protein sequences, predicted tertiary structures, putative DNA binding sites, domains, homologous templates in the Protein Data Bank, protein family classifications, multiple sequence alignments, consensus protein sequence motifs, web logo of each family, and web links to the soybean transcription factor database PlantTFDB, known EST sequences, and other general protein databases including Swiss-Prot, Gene Ontology, KEGG, EMBL, TAIR, InterPro, SMART, PROSITE, NCBI, and Pfam. The database provides rich annotations, and can be browsed and retrieved through convenient web interfaces. The automated process generates annotations and creates database and website, and can be used to annotate others sequenced species.
It came into being as a database of tobacco transcription factors at the time, possibly the largest collection of transcription factor sequences from a single plant species (over 2,500 genes).
Provides the complete transcription factor (TF) repertoires of 6 genome sequenced tree species: papaya, jatoropha, cassava, poplar, castor bean and grape vine from annotated genes on each genome.
LCB-DWH / Linnaeus Centre for Bioinformatics Data Warehouse
A web-based infrastructure for reliable and secure microarray gene expression data management and analysis that provides an online service for the scientific community. The LCB-DWH is an effort towards a complete system for storage (using the BASE system), analysis and publication of microarray data. Important features of the system include: access to established methods within R/Bioconductor for data analysis, built-in connection to the Gene Ontology database and a scripting facility for automatic recording and re-play of all the steps of the analysis.
PEPR / Public Expression Profiling Resource
Provides centralized Affymetrix expression profiling data to the public research community, typically before publication in primary research papers. Data released through PEPR are generated within a single centralized research group (Children’s National Medical Center, Microarray Center), with projects originating internally and referred from external institutions. The web interface enables users to export many forms of data associated with any particular profile, including raw image files (.dat), processed image files (.cel) and interpretation files (.txt). It allows researchers to perform on‐line queries of expression profiles by any number of experimental variables (tissue, species, chip type, etc.).
Provides a resource that allows different groups performing microarray experiments related to a common subject to create a common coherent knowledge base and to analyse it. The Genopolis database has been implemented as a dedicated system for the scientific community studying dendritic and macrophage cells functions and host-parasite interactions. The Genopolis Database system allows the community to build an object based MIAME compliant annotation of their experiments and to store images, raw and processed data from the Affymetrix GeneChip® platform.
Facilitates workflow management of spotted microarray experiment production, provides an efficient way to gather complete experimental information, and supports collaborative work. In fact, thanks to its well-defined core database architecture, MicroGen facilitates collection and storage of all experimental information according to the MIAME standard. Ordered availability of such information allows subsequent efficient and effective analyses of experimental results. MicroGen also facilitates experimental data comparison. In fact, it allows saving quantitative results also in a standard text format. This increases portability and compatibility of results. Identification of results from experiment with similar characteristic is also facilitated thanks to the complete experimental information orderly stored within the system. MicroGen graphic user interface is very simple and intuitive, providing an easy method for a biologist or a biotechnology technician to read or collect information about performed microarray experiments.
RAD / RNA Abundance Database
Provides a comprehensive MIAME-supportive infrastructure for gene expression data management and makes extensive use of ontologies. Specific details on protocols, biomaterials, study designs, etc. are collected through a user-friendly suite of web annotation forms. Software has been developed to generate MAGE-ML documents to enable easy export of studies stored in RAD to any other database accepting data in this format (e.g. ArrayExpress). This infrastructure enables a large variety of queries that incorporate visualization and analysis tools and have been tailored to serve the specific needs of projects focusing on particular organisms or biological systems.
SMD / Stanford Microarray Database
A research tool and archive that allows hundreds of researchers worldwide to store, annotate, analyze and share data generated by microarray technology. SMD supports most major microarray platforms, and is MIAME-supportive and can export or import MAGE-ML. The primary mission of SMD is to be a research tool that supports researchers from the point of data generation to data publication and dissemination, but it also provides unrestricted access to analysis tools and public data from 300 publications. In addition to supporting ongoing research, SMD makes its source code fully and freely available to others under an Open Source license, enabling other groups to create a local installation of SMD.
UNC Microarray Database / University of North Carolina Microarray Database
Provides the service for microarray data storage, retrieval, analysis, and visualization. Access to non-public data is limited to registered University of North Carolina – Chapel Hill researchers and their collaborators.
Provides high quality reference alignments based on 3D structural superpositions. BAliBASE is a large scale benchmarks specifically designed for multiple sequence alignment. The alignment test cases are manually refined to ensure the correct alignment of conserved residues. The alignments are organized into reference sets designed to represent real multiple alignment problems.
HOMSTRAD / HOMologous STRucture Alignment Database
Allows study of both sequence and structure relationships between homologous proteins. HOMSTRAD is a database that provides combined protein sequence and structure information extracted from the Protein Data Bank (PDB), a primary protein structure repository, and relies heavily on other databases, especially Pfam and SCOP. It contains about 2700 families, and just under half of which are multi-member.
Includes data and software to evaluate the accuracy of protein multiple sequence alignments.
PREFAB / Protein Reference Alignment Benchmark
Tests multiple sequence alignment methods. PREFAB is a test set which exploits methodology, test data and statistical methods that have previously been applied to alignment accuracy assessment. It contains (i) a set of reference pair alignments, (ii) samples of sequences for testing multiple alignment programs and (iii) a program of assessing the quality of work of multiple alignment programs.
Provides a database of functional genomics experiments. ArrayExpress includes data generated by sequencing or array-based technologies. This resource integrates the Gene Expression Atlas and the sequence databases at the European Bioinformatics Institute. Advanced queries provided via ontology enabled interfaces include queries based on technology and sample attributes such as disease, cell types and anatomy.
Facilitates the capture and management of structured metadata and data for diverse biological research samples. BioSample provides a dedicated area that presents collected project metadata including: the project data type as genome sequencing, transcriptome or gene expression; attributes concerning the sample scope and target, method, and project goals, submitting group, title, organism name or environmental sample label and brief description.
BioSD / BioSamples Database
Provides a collection of biological samples used in molecular experiments, such as sequencing, gene expression or proteomics. BioSD enables researchers to submit sample descriptions once and reference them later in data submissions to assay databases. It stores and links sample information within EBI databases such as ENA, ArrayExpress and PRIDE. This database is queriable through specific filters, such as disease or organism.
DOR / DDBJ Omics Archive
A public functional genomics data repository supporting MIAME and MINSEQE-compliant data submissions. Array- and sequence-based data are accepted in the MAGE-TAB format.
DRA / DDBJ Sequence Read Archive
An archive database for output data generated by next-generation sequencing machines including Roche 454 GS System, Illumina Genome Analyzer, Applied Biosystems SOLiD System, and others.
EGA / European Genome-phenome Archive
A permanent archive that promotes the distribution and sharing of genetic and phenotypic data consented for specific approved uses but not fully open, public distribution. The EGA follows strict protocols for information management, data storage, security and dissemination. Authorized access to the data is managed in partnership with the data-providing organizations. The EGA includes major reference data collections for human genetics research.
ENA / European Nucleotide Archive
Offers a repository for related with nucleotide sequencing workflows. ENA provides data model containing input information, output machine data and interpreted information. The database gathers a wide range of information as well as raw sequence data and derived data, including sequences, assemblies and functional annotation accompanied by studies and samples, to provide experimental context.
GEO / Gene Expression Omnibus
Provides high-throughput microarray and next-generation sequence (NGS) functional genomic data sets. GEO archives raw data, processed data and metadata submitted by the research community. Its data are indexed, cross-linked and searchable. This database gives access to several tools and graphical renderings allowing users to easily explore and interpret data available on the platform. It can be useful to develop and test new hypotheses.
SRA / Sequence Read Archive
Stores raw sequence data from next-generation sequencing (NGS) technologies. SRA is a database which works as a core infrastructure for sharing of pre-publication sequence data, with the aim to make sequence data available to the research community to enhance reproducibility and allow for new discoveries by comparing data sets. The database also stores alignment information in the form of read placements on a reference sequence.
NIF / Neuroscience Information Framework
Gives access to a searchable collection of neuroscience data, a catalog of biomedical resources, and an ontology for neuroscience. NIF is a dynamic inventory of web-based neuroscience resources designed to serve neuroscience investigators by facilitating directed and intelligent access to data and findings, aiding integration, synthesis, and connectivity across related data and findings, stimulating new and enhanced development of neuroinformatic resources, and enabling new and enhanced analyses of data.
PMC / PubMed Central
A free full-text archive of biomedical and life sciences journal literature at the U.S. National Institutes of Health’s National Library of Medicine (NIH/NLM). As an archive, PubMed Central is designed to provide permanent access to all of its content, even as technology evolves and current digital literature formats potentially become obsolete.
Comprises more than 24 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full-text content from PubMed Central and publisher web sites.
A curated collection of chaperonin sequence data collected from public databases or generated by a network of collaborators exploiting the cpn60 target in clinical, phylogenetic and microbial ecology studies.
ACLAME / A CLAssification of Mobile genetic Elements
Collects and classifies mobile genetic elements (MGEs) including phases and plasmids from various sources. ACLAME provides a platform for analyzing MGE diversity from a global scale down to specific groups of MGEs and tools for the detection of new MGEs integrated in bacterial genomes. The BLAST search interface allows for a simple querying of the ACLAME sequences, returning information such as for each hit sequence, the functional annotation, the MGE, host(s) and protein families it belongs to.
ARNIE / AVEXIS Receptor Network with Integrated Expression
An online database that integrates the extracellular protein interaction network. ARNIE allows users to browse the network by clicking on individual proteins, or by specifying the spatiotemporal parameters using the drop-down menus. Clicking on connector lines will allow users to compare stage-matched expression patterns for genes encoding interacting proteins. Additionally, users can rapidly search for their genes in the network using the BLAST server provided.
BSRD / Bacterial Small Regulatory RNA Database
Provides a bacterial sRNA repository. BSRD characterizes sRNA in large-scale transcriptome sequencing projects. This database offers to users functional descriptions for sRNA and contains annotations from manually curated literature mining like groth phase, Hfq binding and Rho-independent terminators. Large-scale target search prediction of identified sRNA is also available, and it includes more than 9000 sRNA entries from up to 900 bacterial strains.
CCDS / Consensus Coding Sequence
Tracks identical protein annotations on the reference mouse and human genomes with a stable identifier. CCDS is a resource that supports consistent, comprehensive annotation of the protein-coding content of the human and mouse genomes. It is built by consensus; each member of the collaboration contributes annotation, quality assessments, and curation. This data sets can be accessed from several public resource.