JGI Genome Portal / Joint Genome Institute Genome Portal
Gives access to genome sequences and annotations, and allows exploration of genomic data. JGI Genome Portal furnishes worldwide statistics on the usage of the JGI resources and the information about the latest genome releases and new tool development. It can automatically generate and monitor BioSample and BioProject submissions to NCBI. This database permits users to access to other resources such as the Genomes OnLine Database (GOLD).
MIM notation / Molecular Interaction Maps notation
Provides a list of convention for annotating and organize relationships in bioregulatory systems. MIM notation describes the relationships between multiple entities by using interactions glyphs, a controlled vocabulary and typographical convention. The system is able to display complex set of regulatory network interconnections or to capture different cell types and cell states. It also allows the specification of known molecular data as well as the addition of contingencies.
HOMD / Human Oral Microbiome Database
A body site-specific public database providing the scientific community with comprehensive information on prokaryote species that are present in the human oral cavity. This dynamic database provides a curated taxonomy of oral prokaryotes, a curated set of full-length 16S rRNA reference sequences, and BLAST tools that allow the identification of unknown isolates or clones based on their 16S rRNA sequence; additionally, phenotypic, bibliographic, clinical and genomic information are linked for each taxa. The web-based interfaces and software tools are implemented to facilitate the query and analysis of this comprehensive dataset.
Provides access to diverse set of data sets to predict novel ciliary genes. CiliaCarta is a compendium of ciliary genes based on a naive Bayesian integration of a diverse set of data sets. The webpage provides two tables : the CiliaCarta as the compendium of ciliary genes, comprising previous SYSCILIA Gold Standard resource, Gene Ontology (GO) annotated genes and predictions which have been validated experimentally, and the naive Bayesian integration, including all datasets used in the analysis. The resource can be used to objectively prioritize candidate genes in whole exome or genome sequencing of ciliopathy patients.
Simulates collections of independent genomic data sets, and performs training and validation with predicting algorithms. SimulatorZ is a package intended primarily to simulate collections of independent genomic data sets, as well as performing training and validation with predicting algorithms. It supports ExpressionSet and RangedSummarizedExperiment objects. It purposes functions to generate useful values from the true models for further analysis, to filter genes by Integrative Correlation and many others.
UCSC microbial genome browser
Provides access to more than 400 microbial species from Archaea and Bacteria. UCSC Genome Browser provides a rapid and reliable display of any requested portion of genomes at any scale, together with dozens of aligned annotation tracks. It supports text and sequence based searches that provide quick, precise access to any region of specific interest. The tool permits to the user to look at a whole chromosome to get a feel for gene density, open a specific cytogenetic band to see a positionally mapped disease gene candidate, or zoom in to a particular gene to view its spliced expressed sequence tags (ESTs) and possible alternative splicing.
VGO / Viral Genome Organizer
Views genes and predicts Open Reading Frames (ORFs) in one or a series of genomes. VGO is an easy-to-use genome browser that searches for sequences within genomes. It searches the translated genome for matches to mass spec peptides and for the longest oligonucleotide shared between a series of unaligned genomes. VGO also views genes and predicted ORFs in one or a series of genomes. Because VGO talks to VOCs (Viral Orthologous Clusters) database of genome sequences, it can be used to compare genomes.
Stores viral sequences and serological data in rethinkdb. The fauna database and scripts are designed around influenza and Zika viruses. Fauna is a part of the nextstrain project. The nextstrain project derives from nextflu, which was specific to influenza evolution. nextstrain is comprised of three components: (i) fauna, a database and IO scripts for sequence and serological data, (ii) augur, informatic pipelines to conduct inferences from raw data, and (iii) auspice, a web app to visualize resulting inferences.
Assists users in managing HIV molecular data. HIVbase is a standalone software developed for enabling local database built using Windows operating system. The application permits to handle DNA/amino acid sequences and their related data. It includes three mains functions: (i) storing, for importing personalized data and identifying HIV proteins; (ii) querying for analyzing and retrieving information and; (iii) annotating for allowing entries to be defined as features such as text or decimal.
Permits to create and update Cyc databases. CycADS is an ad hoc data management system centred on a specific database model and on a set of Java programs to import, filter and export relevant information. This tool is able: (i) to help the generation of improved BioCyc computationally derived from databases; (ii) to analyze the metabolism of the pea aphid; (iii) to generate annotation files in other formats; and (iv) to store the biological function data obtained for each protein from different annotation sources. It was used for generating ‘AcypiCyc’ (11) TricaCyc’, DromeCyc databases.
It is the leading website and database of Drosophila genes and genomes. FlyBase curates a variety of data from published biological literature, including phenotype, gene expression, interactions (genetic and physical), gene ontology (GO) information and many others. These data are organized in ∼31 different data-type reports such as the Gene Report or the Allele Report. The range of data we provide increases and changes as new types of data become available. Whether you are using the fruit fly Drosophila melanogaster as an experimental system or wish to understand Drosophila biological knowledge in relation to human disease or to other model systems, FlyBase can help you successfully find the information you are looking for.
EBI / EMBL-EBI - The European Bioinformatics Institute
Supplies an access to several biological data resources and bioinformatics services. EBI is a platform that covers the entire range of biological sciences: raw DNA sequences to curated proteins, chemicals, structures, systems, pathways, ontologies and literature. Databases, tools, as well as web services are provided for sharing data, performing queries and analyzing results. Users can also deposit their data through a data submission page. All the resources are freely available without restriction, with few exceptions.
A scientific database for the bacterium Escherichia coli K-12 MG1655. The EcoCyc project performs literature-based curation of the entire genome, and of transcriptional regulation, transporters, and metabolic pathways. New experimental discoveries about gene products, their function and regulation, new metabolic pathways, enzymes and cofactors are regularly added to EcoCyc. SmartTable tools allow users to browse collections of related EcoCyc content. SmartTables can also serve as repositories for user- or curator-generated lists. EcoCyc supports running and modifying E. coli metabolic models directly on the EcoCyc website.
Aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 24 release contains 19,815 protein-coding, 19,941 long noncoding RNA loci and 79,930 coding transcripts. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons.
CGD / Candida Genome Database
Provides gene, protein and sequence information for multiple Candida species. CGD contains web-based tools for accessing, analyzing and exploring these data, to facilitate and accelerate research into Candida pathogenesis and biology. Locus pages comprise a summary view along with several additional tabs that display more detailed information, including phenotype details, Gene Ontology term curation, protein product details for coding genes, notes on changes to the sequence or structure of the gene, a comprehensive reference list and the Homology Information tab, a place where phylogeny- and similarity-related data may be examined and evaluated.
Provides a resource for data analysis and visualization in a gene-by-gene or genome-wide scale. PlasmoDB is a functional genomic database for Plasmodium spp. It belongs to a family of genomic resources that are housed under the EuPathDB Bioinformatics Resource Center (BRC) umbrella. Data in PlasmoDB can be queried by selecting the data of interest from a query grid or drop down menus. Various results can then be combined with each other on the query history page.
Unifies the existing genetic and physical maps with the nucleotide and protein sequence databases in a fashion that should speed the discovery of genes underlying inherited human disease. The GeneMap is a database that provides the mapping information and associated data and annotations. This resource constitutes an important infrastructure and tool for the study of complex genetic traits, the positional cloning of disease genes, the cross-referencing of mammalian genomes, and validated human transcribed sequences for large-scale studies of gene expression.
DDBJ / DNA Data Bank of Japan
Maintains and provides public archival, retrieval and analytical services for biological information. The contents of the DDBJ databases are shared with the US National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute (EBI) within the framework of the International Nucleotide Sequence Database Collaboration (INSDC). Since 2013, the DDBJ Center has been operating the Japanese Genotype-phenotype Archive (JGA) in collaboration with the National Bioscience Database Center (NBDC) in Japan. In addition, the DDBJ Center develops semantic web technologies for data integration and sharing in collaboration with the Database Center for Life Science (DBCLS) in Japan.
Provides the biological research community with a comprehensive encyclopedia of genomic functional elements in the model organisms C. elegans and D. melanogaster. modENCODE is run as a Research Network and the consortium is formed by 11 primary projects, divided between worm and fly, spanning the domains of gene structure, mRNA and ncRNA expression profiling, transcription factor binding sites (TFBS), histone modifications and replacement, chromatin structure, DNA replication initiation and timing, and copy number variation (CNV).
Giant Panda Database
Presents the entire panda genome sequence, as well as the annotation information such as gene structure and functions, non-coding RNAs and repeat elements. The Giant Panda Database is illustrated in a MapView, which is powered by Google Web Toolkit. The polymorphism information detected in the diploid genome, such as single nucleotide polymorphisms (SNPs), Indels, and structural variations (SV) were also presented. A module was developed to browse large-scale short reads alignment. This module enabled users track detailed divergences between consensus and sequencing reads.
Carica papaya
A database which offers gene annotation of Carica papaya. The papaya genome is three times the size of the Arabidopsis genome, but contains fewer genes, including significantly fewer disease resistance gene analogues. Papaya unigenes from complementary DNA were aligned to the unmasked genome assembly, which was then used in training ab initio gene prediction software. Spliced alignments of proteins from the plant division of GenBank, and transcripts from related angiosperms, were generated. Gene predictions were combined with spliced alignments of proteins and transcripts to produce a reference gene set. Carica papaya belongs to the Caricaceae family.
Ricinus communis
A database which offers gene annotation of Ricinus communis, also known as Castorbean. The genome sequence assembly was searched for repetitive DNA using a combination of sequence alignment to databases of repetitive sequences and RepeatScout to identify repeats de novo. Overall, over 50% of the genome was identified as repetitive DNA (excluding low-complexity sequences), most of which could not be associated with known element families. Ricinus communis belongs to the Euphorbiaceae family.
Anolis carolinensis
A database which offers gene annotation of Anolis carolinensis also known as Carolina anole an arboreal lizard. The anole lizard genome is composed of 13 chromosomes, assembled from 41.9861 contigs and 2.143 scaffolds. The total number of bases in the genome is 1.78Gb. The gene set for anole lizard was built using the Ensembl genebuild pipeline. In addition to the main set, gene models have been predicted for each tissue type using the RNA-Seq pipeline. Anolis carolinensis belongs to the Dactyloidae family.
Pleurobrachia bachei
Offers assembly and gene annotation of Pleurobrachia bachei, which is in the Pleurobrachiidae family. The database sequences the Pleurobrachia bachei genome and identifies ~19,600 gene models, 96% of which are supported by transcriptome data. The Pleurobrachia bachei draft genome was assembled using a custom approach designed to leverage the individual strengths of three popular de novo assembly packages and strategies: Velvet, SOAPdenovo, and pseudo-454 hybrid assembly with ABySS.
Yersinia pestis
Offers assembly and gene annotation of Yersinia pestis, which is in the Enterobacteriaceae family. Yersiniae consist of 11 species that have been traditionally distinguished by DNA-DNA hybridisation and biochemical analyses. The database generates reference genomes for two of the human pathogenic Yersinia: Y. pestis and Y. enterocolitica. The genome of Y. pestis is punctuated with pseudogenes demonstrating that despite its high virulence Y. pestis is in the early stage of genome decay, eliminating genes no longer required outside it mammalian host.
Ascaris suum
Offers assembly and gene annotation of Ascaris suum also known as large roundworm of pigs, which is in the Ascarididae family. The database reports the 273 megabase draft genome of Ascaris suum and compares it with other nematode genomes. This genome has low repeat content (4.4%) and encodes about 18,500 protein-coding genes. The A. suum secretome (about 750 molecules) is rich in peptidases linked to the penetration and degradation of host tissues, and an assemblage of molecules likely to modulate or evade host immune responses. This genome provides a comprehensive resource to the scientific community and underpins the development of urgently needed interventions (drugs, vaccines and diagnostic tests) against ascariasis and other nematodiases.
IMG/M / Integrated Microbial Genomes with Microbiome Samples
A database for analysis and annotation of genome and metagenome datasets in a comprehensive comparative context. IMG/M includes archaea, bacteria, eukarya, plasmids, viruses, genome fragments (partially sequenced genomes), as well as metagenomes and metatranscriptome datasets. IMG performs feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNA features, as well as CRISPR elements.
A light weight comprehensive genome resource and sequence analysis platform for oomycete organisms. EuMicrobedbLite is a successor of the VBI Microbial Database (VMD) that was built using the Genome Unified Schema (GUS). This database has 26 publicly available genomes and 10 EST datasets of oomycete organisms. The browser page has dynamic tracks presenting comparative genomics analyses, coding and non-coding data, tRNA genes, repeats and EST alignments. In addition, 44777 core conserved proteins were defined from twelve oomycete organisms that form 2974 clusters. The user interface has undergone major changes for ease of browsing. Queryable comparative genomics information, conserved orthologous genes and pathways are among the new key features updated in this database. Annotations for the organisms are updated once every six months to ensure quality.
CGD / Cucurbit Genomics Database
A database which offers gene annotation of cucurbit. This base offers the genome of Melon (Cucumis melo), Cucumber (Cucumis sativus), Watermelon (Citrullus lanatus), Pumpkin (Cucurbita maxima). The Cucurbitaceae consist of 98 proposed genera with 975 species, mainly in regions tropical and subtropical. All species are sensitive to frost. Most of the plants in this family are annual vines, but some are woody lianas, thorny shrubs, or trees (Dendrosicyos). Cucurbit belongs to the Cucurbitaceae family.
Provides an easy way of accessing the sequences and all-inclusive annotation data on the structures of the cyanobacterial genomes. It contains cyanobacterial genomic sequences from 376 species, which consist of 86 complete and 290 draft genomes. The user interface was optimized for large genomic data to include the use of semantic web technologies and JBrowse. CyanoBase focuses on the representation and reusability of reference genome annotations, which are continuously updated by manual curation. Advanced users can also retrieve this information through the representational state transfer-based web application programming interface in an automated manner.
PATRIC / Pathosystems Resource Integration Center
Aims to assist scientists in infectious-disease research. PATRIC is a National Institute of Health (NIH) supported bioinformatics resource center that has been built to enable comparative genomic analysis of bacterial pathogens. The database provides researchers with an online resource that stores and integrates a variety of data types (e.g. genomics, transcriptomics, protein-protein interactions (PPIs), three-dimensional protein structures and sequence typing data) and associated metadata. Tools and services for bacterial infectious disease research are also available.
MBGD / Microbial Genome Database
Gathers information related to full microbial genomes. MBGD is a repository that focuses in assisting researchers in comparing genomic information by providing data about both prokaryotic and eukaryotic microbes as well as four multicellular eukaryotes. The database contains precomputed orthologs tables and permits users to generate their own. It also includes a function: MyMBGD, for submitting users’ data to the server and perform a customized ortholog analysis.
SGD / Saccharomyces Genome Database
Compiles comprehensive integrated biological information about the budding yeast Saccharomyces cerevisiae. SGD is a manually-curated database which aims to improve the discovery of functional relationships between sequence and gene products in fungi and higher organisms. The database records information about the yeast genome and its genes, proteins, and other encoded features. Moreover, it contains several bioinformatic tools to facilitate experimental design and analysis.
A centralized gene-annotation portal that enables researchers to access distributed gene annotation resources. The unique features of BioGPS, compared to those of other gene portals, are its community extensibility and user customizability. Users contribute the gene-specific resources accessible from BioGPS (‘plugins’), which helps ensure that the resource collection is always up-to-date and that it will continue expanding over time. BioGPS users can create their own collections of relevant plugins and save them as customized gene-report pages or ‘layouts’. In addition, we recently updated the most popular plugin, the ‘Gene expression/activity chart’, to include ∼6000 datasets (from ∼2000 datasets) and we enhanced user interactivity.
GCGene / Gastric Cancer Gene database
A literature-based database with comprehensive annotations supported by a user-friendly website. In the current release, we have collected 1,815 unique human genes including 1,678 protein-coding and 137 non-coding genes curated from extensive examination of 3,142 PubMed abstracts. The resulting database has a convenient web-based interface to facilitate both textual and sequence-based searches. All curated genes in GCGene are downloadable for advanced bioinformatics data mining. Gene prioritization was performed to rank the relative relevance of these genes in gastric cancer development.
