Intends to describe the cellular architecture of a unicellular organism, or a cell type from a multicellular organism as well as the particular cellular components from which a specific type of cell is built. CCO includes about 160 terms containing a common-name and a definition. Many terms includes synonyms, a reference for the definition, terms and a 'sensu' slot to indicate organisms related taxonomic class.
Eases the process of filtering and prioritizing the presumably functional single nucleotide variants (SNVs) from a long list of SNVs identified in a typical whole exome sequencing (WES) study. dbNSFP can work as a local and self-sustaining database without need for internet connection. The database provides more than 82 800 000 non-synonymous SNVs (nsSNVs) and splice site SNVs (ssSNVs).
Provides a resource that allows investigations of the relationship among genetic variation, gene expression, and other molecular phenotypes in multiple human tissues. GTEx gives a unified view of genetic effects on gene expression across a broad range of tissue types, most of which have not been studied for expression quantitative trait locus (eQTLs) previously. This dataset was constructed on the basis of a collection of 900 donors.
Gives a multi-dimensional view of medical states. iPOP includes healthy states, response to viral infection, recovery, and Type 2 Diabetes (T2D) onset. The database aims to (1) determine the genome sequence at high accuracy and evaluate disease risks, (2) survey omics components over time and integrate the relevant omics information to assess the variation of physiological states, and (3) investigate the expression of personal variants at the level of RNA protein.
Represents a cloud-based and cooperative community resource. GenomeSpace contains a large collection of bioinformatics tools. It allows seamless transition between tools and the large set of connected tools enriches the interpretation of integrative analyses. It is possible user to explore its same data in multiple tools, this permits that the analysis to be examined in greater depth and diversity than with any single tool.
A dedicated database for bacterial insertion sequences (ISs). One of its functions is to assign IS names and to provide a focal point for a coherent nomenclature. It is also the repository for ISs. Each new IS is indexed together with information such as its DNA sequence and open reading frames or potential coding sequences, the sequence of the ends of the element and target sites, its origin and distribution together with a bibliography where available. Another objective is to continuously monitor ISs to provide updated comprehensive groupings or families and to provide some insight into their phylogenies.
Contains and allows to share bioinformatics workflows. myExperiment is a collaborative repository and social network for workflows and related research objects regardless of their format or native platform, which enables to share and reuse digital experimental protocols and support reproducible science. The database provides a flexible authorization model and allows any uploaded content to be made available with varying levels of sharing permissions.
Allows users to search whole prokaryotic genomes for intrinsic terminators. TransTermHP permits prediction of Rho-independent transcription termination in bacterial genomes. This method assists in the detection of signals in genomic DNA. It assigns each candidate terminator a score related to the likelihood that it arose by chance. Moreover, it is designed to detect the common, classic intrinsic terminator motif: a hairpin stem followed by a poly-U tail.
Offers a draft de novo assembly of the gorilla Y Chromosome. GorillaY was create by using an integrated strategy to sequence and assemble the gorilla Y Chromosome. The Y Chromosome-specific reads was extracted by using an algorithm developed in-house, RecoverY. Approximately 12,000 copies of the Y Chromosome were flow sorted from a fibroblast cell line of western lowland gorilla male.
Hosts data, software, tools, results output and crosslinks to other resources which fully enables reproducibility of associated GigaScience articles, there is also a subset of data sets associated with other journals. GigaDB is a database where a data set is defined as a group of files related to and in support of an article or study. The scope covers large scale data from life sciences. There is no restriction on data size which provides a service for more difficult to-access data types (e.g. imaging, neuroscience, ecology, etc.) as well as software used to analyze large-scale data sets.
A wiki resource of the functional consequences of human genetic variation as published in peer-reviewed studies. Online since 2006 and freely available for personal use, SNPedia has focused on the medical, phenotypic and genealogical associations of single nucleotide polymorphisms. Entries are formatted to allow associations to be assigned to single genotypes as well as sets of genotypes (genosets).
Integrates the whole genome data accessible to life scientists with minimal computational expertise. ORIO is a web-based resource with an intuitive user interface for rapid analysis and integration of next generation sequencing (NGS) data. It first iteratively finds read coverage values at each genomic feature for each NGS dataset. Data are then integrated using clustering-based approaches, giving hierarchical relationships across NGS datasets and separating individual genomic features into groups.
Compiles information about cAMP-response element binding protein (CREB) target genes. CREB Target Gene Database intends to furnish, for each entry, CREB binding sites on the promoters and their occupancy accompanied by gene activation by cAMP in tissues. Searches can be made by Genbank accession numbers, gene symbols, gene names, or locuslink numbers.
Serves to reap information about sequence-tagged sites. MSY Breakpoint Mapper is a database of sequence tagged sites (STSs) and an interface for use in examining male-specific region of the Y chromosome (MSY) deletions. This resource contains more than 1200 Y-specific STSs, and each of these 1200 archived STSs is operationally defined by a polymerase chain reaction (PCR) assay.
Provides information about genome dynamics during the cell life. DNAtraffic is an annotated resource that contains data about: (i) DNA metabolism; (ii) proteins enrolled in widely understanding the DNA metabolism; (iii) DNA damage (damage type, damage source and damage effect); (iv) diseases related to the assembled human proteins and (v) drugs targeted on nucleic acids metabolism and proteins involved in the maintenance of genome stability. The database is addressed to scientists, pharmacologists and students.
Provides a manually curated collection of disease-associated enhancers. DiseaseEnhancer is a database that makes all disease-associated enhancer information publicly available in one location, providing an important and live-updated resource that facilitates the understanding of regulatory mechanisms in disease pathogenesis. It also provides a mutation map plot to show the mutations mapped in the disease-associated enhancers to help understand the roles of enhancers in diseases.
Serves for clinical diagnostics based on an ontological search routine. Phenomizer ranks candidate diseases according to their semantic similarity with the query terms. It gives a p value that indicates whether the similarity scores of best-matching candidate diseases are significantly better than would be expected by chance. This platform aims to guide the differential diagnostic process in human genetics.
A RNA-sequencing data set. rnaseqmixture provides a valuable resource for benchmarking different protocols and data pre-processing workflows. The classic mixture design allows precision and bias to be quantified via a non-linear model for each gene, and used as a basis for comparing different sample preparation methods. This mixture design also allows for internal comparisons to be made within methods for benchmarking differential expression and differential splicing analysis methods.
A charitable organization working to generate, aggregate and interpret human biological and trait data on an unprecedented scale. Open data is a critical component of the scientific method, but genomes are both identifiable and predictive. As a result, many studies choose to withhold data from participants and restrict access to researchers. The PGP's public data is a common ground to collaborate and improve our understanding of genomes.
Facilitates feature retrieval and classification of very large single nucleotide variants (SNVs) datasets. SNVBox is a database of pre-computed predictive features that can be generally used to aid in the development of classification algorithms that predict the impact of either germline or somatic SNVs. The features have been precomputed for each codon in all protein-coding exons of annotated human mRNA transcripts.
A comparison of different NGS read mappers. Detailed performance comparisons of NGS read aligners are provided. Benchmarks only measure specific aspects and may not be used to claim any universal superiority or inferiority of a particular tool. In order to optimize this benchmark, all readers are encouraged to reproduce this data and to come up with alternative benchmarks.
Gathers expression networks and associated visualization tools for model forest tree species. PlantGenIE is a platform for the exploration of Populus, conifer and Arabidopsis genomics data including genome browsers, gene list annotation, Blast homology searches and gene information pages. This intuitive platform gives access to large-scale and genome-wide genomics data to inform biological insight.
Proposes a list of pre-computed database identifiers conversions for four different organisms: homo sapiens, mus musculus, danio rerio and rattus norvegicus. dbOrg consists of a list of downloadable files with multiple input IDs formats including Ensembl or Affymetrix which are converted into another format with specification about both their input and output version.
Deals with nitrogen (N) cycle genes profiling from shotgun metagenomes. NCycDB is a manually curated database, intending to assist users in building knowledge-based functional gene databases, which exploit information gathered from public repositories such as KEGG or eggNOG. This database consists of three downloadable files : both of them provide representative sequences with different sequence identity rates, and the last one supplies a mapping file which links gene names and sequences IDs.
Contains traits on clonal growth and vegetative regeneration for the European temperate flora. CLO-PLA is a database that can help to assess the roles of vegetative means of regeneration and spread in plant communities under the effect of various biotic and abiotic filters. It can serve as a source of reference on persistence traits of European temperate flora and, eventually, as a guide for trait sampling in other regions of the world. CLO-PLA provides basic and detailed information referring to the most widely applied terms required for studying vegetative organs. This offers an easy self-education opportunity for researchers willing to improve their expertise in this field.
Generates a fully structured local database with an intuitive user-friendly graphic interface for personal computers. GeneBase is a full parser of the National Center for Biotechnology Information (NCBI) Gene database. It allows users to do original searches, calculations and analyses of the main information about genes which are fully annotated with the ‘Gene Table’ section in NCBI Gene. Furthermore, for a subset of gene records, it integrates nucleotide sequences useful for additional elaboration with the corresponding gene-associated meta-information.
Allows users to discover, download and deposit large structural biology datasets. SBDG is a flexible data publication system that allows deposition of a variety of large primary datasets. The database collection is limited to datasets that support journal publications, referred to as primary data. Datasets are stored as sets of experimental metadata (experimenters, sample, collection facility) and associated files and directories comprising the data of interest. SBDG can facilitate integration of the Data Grid with regional projects and preservation of primary diffraction datasets.
Provides results of comparative genomic hybridization (CGH) analyses on more than 900 solid tumors of various types. MCG CNV Database offers copy number variant (CNV) and loss of heterozygosity (LOH) detected through microarray analyses in healthy Japanese population. This resource can be an assistance to estimate a pathogenicity of CNV or LOH detected in subjects having possible involvement of cryptic genomic aberrations behind their pathogenesis.
An open database which allows participants of Direct-To-Consumer genetic testing to publish their genetic data at no cost along with phenotypic information. Through this crowdsourced effort of collecting genetic and phenotypic information, openSNP has become a resource for a wide area of studies, including Genome-Wide Association Studies.
Automatically augments annotations in Gene Ontology annotations (GOA) with additional context. PhenoGO is a multi-organism database that integrates existing Gene Ontology annotations with phenotypic context using a number of widely used structured ontologies. It was developed to facilitate high throughput mining of experimental, phenotypic or disease contexts associated to gene-to-GO annotations.
Allows validation and analyses of human leukocyte antigen (HLA)-typed population samples and their connection with the GENE[VA] database. HLA-NET is a platform conceived as an evolving set of tools and utilities for the HLA world. These tools enable to handle and analyze data with ambiguities. The platform also contains a database of European, North African and West Asian population samples tested for HLA.
Identifies region-specific single nucleotide polymorphisms (SNPs) in which the polymorphic nucleotide creates a restriction fragment length polymorphism (RFLP) that can be readily assayed at the benchtop using restriction enzyme digestion of SNP-containing PCR products. SNP2RFLP permits user-defined queries that maximize the informative markers for a specific application, and allows to retrieve an adequate and manageable number of markers. This tool facilitates fine-mapping in a region containing a mutation of interest.
Allows users to perform functional analyses using Gene Ontology (GO) annotations in various organisms. DAGViz is a Directed Acyclic Graphs (DAG)-based browser that facilitates the integrative analysis of functional categories associated with a gene product by displaying all DAG-based information for multiple GO terms using a tabulated color chart screen. This resource provides GO annotation datasets for 45 organisms including fungi, protist, animal, plant and bacterial species.
Provides an online searching platform for antibiotic resistant genes (ARGs). ARGs-OSP is a database that offers a global profile of the antibiotic resistome. This resource was constructed by integrating two large datasets of the whole genome database (WGD) and metagenomic database (MGD). It includes search and download functionalities that were designed for users to retrieve the occurrence of ARGs in different taxonomy and the abundance of ARGs in different habitats.
Provides literature, tools, pipelines, pathway and SRA specific to NGS and cancer. The Comprehensive resources for cancer NGS data analysis (CRCDA) is a web portal that can be queried based on (i) type of data and (ii) type of cancer. The data available for search is of three types, (i) literature and gene data, (ii) literature data and (iii) SRA data.
Provides a resource for the development and validation of exome-based clinical panels. ExomeSlicer is a web-based tool for the identification of gene-specific and exome-wide technically challenging regions that cannot be reliably sequenced. It can be a source of false positive and/or false negative variant calls. This method supports exome-based and targeted panel development through identification of potential ancillary testing, characterization of test limitation, and streamlining post hoc Sanger sequencing.
Consists of a novel neural network implementation applying ribosome profiling data for the annotation of translation start site (TISs) in prokaryotes. DeepRibo applies both convolutional neural network (CNN) and recurrent neural network (RNN) architectures for attaining and combining information from the ribosome profiling signal and DNA sequence. It is trained on a combination of available experiments for different bacteria and has been tested on de novo ribo-seq data of bacterial genomes.
Gathers annotation and analysis of binding sites for site-specific transcription factors in the promoter and enhancer regions of genes. NFI-Regulome provides the control regions of genes that have been shown to be regulated by Nuclear Factor I (NFI) transcription factors in the primary literature. It enables rapid comparisons of the size, composition, and organizational structure of the cis-regulatory regions of NFI-regulated genes, selected either by disease-relevance, cell, tissue or developmental stage.
Contains and makes available results of a bioinformatic pipeline that regularly screens public nucleotide sequence databanks. JRC GMO-Amplicons includes patents and available whole plant genomes, through in silico determination of PCR amplification. It can be queried by control laboratories to evaluate results of screening/identification analysis or for developing new detection methods and assessing in silico primers specificity and genetically modified organism (GMO) coverage.
Gives access to adaptive laboratory evolution (ALE) mutations, conditions, and publication reporting. ALEdb can be searched by specific mutations, returns key mutations, and allows users to export mutation data for custom investigation. It offers features to automatically report established ALE adaptive mutation trends. This database is composed of more than 11000 mutations extracted from 18 ALE experiments.
Contains over 2,000 trained models that cover key predictive tasks in genomics, including the prediction of chromatin accessibility, transcription factor binding, and alternative splicing from DNA sequence. Kipoi is a repository that includes standardized data handling that facilitates standardized data input of genomic data types across a wide range of models. Moreover, this database offers an API version that allows programmers to interchangeably use Kipoi models in their software.
Consists of a catalog of global patient-derived tumor xenograft (PDX) repositories. PDX Finder utilizes a data model based on the minimal information standard for PDX models developed in basic and pre-clinical cancer research context. It is designed to integrate, standardize, analyze, and visualize data from labs performing PDX studies. This database is composed of more than 1 900 models.
Stores open-licensed educational modules. Dryad is a database which allows students in advanced secondary, undergraduate, and early graduate courses to work on real data. It allows users to discover, reuse and cite data underlying scientific publications.
Offers an updated information about Gene Ontologies (GO). Go.db is an R package including annotation maps representing the entire Gene Ontology. This dataset spots associations between GO Biological Process (BP) terms and their ancestors according to the directed acyclic graph (DAG) defined by the Disease Ontology Consortium. This package is updated biannually.
Gathers a set of 14 ionizing-Radiation Resistant Bacteria (IRRB) and 14 ionizing-radiation-sensitive bacteria (IRSB). MIL-ALIGN supplies information about proteins involved in basal DNA repair in IRRB.
Provides access to information about transcription factors. MatBase is a database that contains information on transcription factors and the corresponding weight matrices used by MatInspector, for locating potential binding sites of these transcription factors in DNA sequences. The repository also provides information about regulatory interactions between transcription factors and other genes, regulatory modules.
Gathers a collection of genetic terms to assist in the understanding of the terms and concepts employed in genetic research. NHGRI Talking Glossary was constructed on the basis of national science standards and all the terms included are explained by National Institute of Health’s scientists. It provides an alphabetical index allowing users to browse the wanted term and some animations explaining complex terms.
Provides continuous, objective and reproducible evaluation of genome assemblers using docker containers. nucleotid.es compares how different genome assemblers perform against a variety of test sequence data. Multiple different benchmarks show the average performance of each assembler. Every genome assembler is examined as a self-contained Docker application. These containers eliminate the common problem in bioinformatics where the software won't compile or requires multiple additional dependencies. Furthermore, users can pull an assembler from the docker repository and start using it immediately and developers are encouraged to send their assembler to be included in the benchmarks.