Searches protein database using a translated nucleotide query. BLASTX is a BLAST search application that compares the six-frame conceptual translation products of a nucleotide query sequence (both strands) against a protein sequence database. This application can also work in Blast2Sequences mode and can send BLAST searches over the network to public NCBI server if desired.
Allows to find regions of sequence similarity. PSI-BLAST is a protein database search program. The software is able to access the probable substitutions at each sequence position using the results of a previous Gapped-Blast search, an algorithm comparing the amino acid substitution matrix. It can combine search results with robust statistics to build and apply profiles also known as a position-specific scoring matrix. A modified application of PSI-BLAST - PSI-BLASTexB - that solves sequence weighting scheme limitations, was also developed.
Permits functional annotation, management, and data mining of novel sequence data. Blast2GO is based on the utilization of common controlled vocabulary schemas, the gene ontology (GO). It takes in consideration similarity, the extension of the homology, the database of choice, the GO hierarchy, and the quality of the original annotations. This tool is suitable for plant genomics research. It generates functional annotation and assesses the functional meaning of their experimental results.
Allows for the design of CRISPR guide RNA libraries that can be used to edit coding and noncoding genomic regions. GuideScan produces high-density sets of guide RNAs (gRNAs) for single- and paired-gRNA genome-wide screens. Rather than using an alignment tool, GuideScan uses a retrieval tree (trie) data structure, which efficiently and precisely enumerates all targetable sequences present in a given genome. Traversals of the trie allow for the computation of sequence mismatch neighborhoods, which are used to construct databases of gRNAs whose target sites are unique in the genome up to a user-defined number of mismatches. The GuideScan website allows users to input coordinates of genomic features in batch, to choose between designing single internal gRNAs or pairs of flanking gRNAs, and retrieve for each genomic coordinate a pre-defined number of gRNAs or gRNA pairs.
Aligns the identified regions at the gene cluster level to their nearest relatives from a database containing all other known gene clusters. antiSMASH facilitates the mining of bacterial and fungal genomes. It includes gene cluster boundary prediction for fungal biosynthetic gene clusters (BGCs), improved chemistry predictions for terpene, ribosomal peptide and non-ribosomal peptide BGCs, comparative alignment of trans-AT polyketide synthase (PKS) assembly lines and TTA codon annotation. A user interface was also introduced.
A computational approach to predict the functional composition of a metagenome using marker gene data and a database of reference genomes. PICRUSt uses an extended ancestral-state reconstruction algorithm to predict which gene families are present and then combines gene families to estimate the composite metagenome. Using 16S information, PICRUSt recaptures key findings from the Human Microbiome Project and accurately predicts the abundance of gene families in host-associated and environmental communities, with quantifiable uncertainty.
Integrates BioMart data resources with data analysis software in Bioconductor. BiomaRt can annotate a wide range of gene or gene product identifiers with information such as gene symbol, chromosomal coordinates, Gene Ontology and Online Mendelian Inheritance in Man (OMIM) annotation. Furthermore, biomaRt enables retrieval of genomic sequences and single nucleotide polymorphism information, which can be used in data analysis. Fast and up-to-date data retrieval is possible as the package executes direct SQL queries to the BioMart databases. The biomaRt package provides a tight integration of large, public or locally installed BioMart databases with data analysis in Bioconductor creating a powerful environment for biological data mining.
Provides a generalized linear model for functional genomic data and genome annotations. LINSIGHT is a computational method that outperforms state-of-the-art prediction methods in the task of prioritizing noncoding disease variants from the Human Gene Mutation database (HGMD) and the National Center for Biotechnology Information (NCBI) ClinVar database. By integrating a large number of genomic features, LINSIGHT provides a precise, high-resolution description of the fitness consequences of noncoding mutations in human genome.
Allows management, analysis, simulation and visualization of integrated collections of genome, pathway and regulatory data. Pathway Tools is a bioinformatics software environment around a type of model-organism database called Pathway/Genome Database (PGDB). The software can manipulate genome data, metabolic networks and regulatory networks. For each datatype, it provides query, visualization, editing and analysis functions. It also provides visual tools for analysis of omics data sets, and tools for the analysis of biological networks.
Permits ‘genecentric’ annotation of the human genome for laboratory and analytical work carried out at the Core Genotyping Facility (CGF) of the National Cancer Institute. Genewindow integrates data available in the public databases with internal annotations from sequence data generated by our laboratory. It is configured for the human genome and can be applied to other genomes and integrated with the analysis, storage and archiving of data generated in any laboratory setting.
Calculates in silico the extent of identity between two genomes. JSpeciesWS is able to determine overall genome relatedness indices (OGRI). It allows rapid comparisons against the reference database offered by the tool, providing a list of the most similar genomes based on their resulting Tetra-nucleotide signature correlation index. This database is composed of NCBI’s genomic sequence data and includes all primary submissions of assembled genome sequences and their associated annotation data.
Simplifies the annotation of genetic variants in VCF format. Vcfanno can extract and summarize multiple attributes from one or more annotation files and append the resulting annotations to the INFO field of the original VCF file. Vcfanno also integrates the lua scripting language so that users can easily develop custom annotations and metrics. It represents a substantial improvement over existing methods, enabling rapid annotation of whole-genome and whole-exome datasets and provides substantial analytical power to studies of disease, population genetics, and evolution.
A comparative tool for analyzing the regulatory potential of noncoding sequences. Our ability to experimentally identify functional noncoding sequences is extremely limited, therefore, rVISTA attempts to fill this great gap in genomic analysis by offering a powerful approach for eliminating TFBSs least likely to be biologically relevant. rVISTA analysis proceeds in four main steps: (i) detect TFBS matches in each individual sequence using PWMs from the TRANSFAC database, (ii) identify pairs of locally aligned TFBSs, (iii) select TFBSs present in regions of high DNA conservation and (iv) create a graphical display that dynamically overlays individual or clustered TFBSs with the conservation profile of the genomic locus. The rVISTA web server is closely interconnected with the TRANSFAC database, allowing users to either search for matrices present in the TRANSFAC library collection or search for user-defined consensus sequences.
Predicts prokaryotic species based on the number of overlapping (co-occurring) k-mers between the query genome and genomes in a reference database. KmerFinderJS, an implementation of KmerFinder, is a method allowing to identify bacterial species in whole genome data. The software uses Redis, a centralised in-memory database, to store the reference database.
Allows data-mining and visualization of next-generation sequencing (NGS) samples such as enrichment patterns of DNA-interacting proteins at functional genomic regions. ngs.plot has a built-in database of functional elements that facilitates the management of genomic coordinates for users. This software supports large sequencing data and is available through the Galaxy tool shed.
A method that uses a subset of marker genes (MGs) for taxonomic profiling of metagenomes. mOTU is available as a standalone software and is also implemented in MOCAT. Species-level profiles are generated by mapping reads from metagenomes to a database (mOTU.v1.padded) consisting of 10 MGs extracted from 3,496 prokaryotic reference genomes (downloaded from NCBI) and 263 publicly available metagenomes (from the MetaHIT and HMP projects).
Provides an integrated view of omics datasets based on genomic coordinate axes. OmicBrowse is a genome browser that integrates multiple heterogeneous databases into a single omic space. The software employs a graphics interface that assists effective genome-wide analysis with various data records stored in multiple databases. It can be installed on a user’s PC and thus works as a user’s private databases.
Assists users to organize and centralize all variant data and annotations from their lab. Highlander provides researchers several tools for filtering information. This tool, coupled to a local MySQL database, aims to classify all variant data coming from exome- and whole genome sequencing experiments. It also supplies annotations or visualizations functions that allow to detect changes-of-interest amongst the complete list of variants detected in a sample.
An implementation of the whole genome-based, alignment-free composition vector (CV) method for phylogenetic analysis. Users can upload their own sequences to find their phylogenetic position among genomes selected from the server's; inbuilt database. All sequence data used in a session may be downloaded as a compressed file. In addition to standard phylogenetic trees, users can also choose to output trees whose monophyletic branches are collapsed to various taxonomic levels. This feature is particularly useful for comparing phylogeny with taxonomy when dealing with thousands of genomes.
It is the leading website and database of Drosophila genes and genomes. FlyBase curates a variety of data from published biological literature, including phenotype, gene expression, interactions (genetic and physical), gene ontology (GO) information and many others. These data are organized in ∼31 different data-type reports such as the Gene Report or the Allele Report. The range of data we provide increases and changes as new types of data become available. Whether you are using the fruit fly Drosophila melanogaster as an experimental system or wish to understand Drosophila biological knowledge in relation to human disease or to other model systems, FlyBase can help you successfully find the information you are looking for.
Provides publicly available nucleotide sequences for formally described species. GenBank is a comprehensive public database of nucleotide sequences. It also supports bibliographic and biological annotations. The sequences are obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole genome shotgun and environmental sampling projects.
Generates, analyzes, and makes available genomic sequence, expression, methylation, and copy number variation (CNV) data on over 11,000 individuals who represent over 30 different types of cancer. The information generated by TCGA is centrally managed and entered into databases as it becomes available, making the data rapidly accessible to the entire research community. TCGA is a collaborative effort led by the National Cancer Institute and the National Human Genome Research Institute to map the genomic and epigenomic changes that occur in types of human cancer, including nine rare tumors. Its goal is to support new discoveries through the generation of a catalog of somatic aberrations occurring in the different neoplasms, and accelerate the pace of research aimed at improving the diagnosis, treatment, and prevention of cancer.
Provides high-throughput microarray and next-generation sequence (NGS) functional genomic data sets. GEO archives raw data, processed data and metadata submitted by the research community. Its data are indexed, cross-linked and searchable. This database gives access to several tools and graphical renderings allowing users to easily explore and interpret data available on the platform. It can be useful to develop and test new hypotheses.
Displays assembled human and other mammalian genomes. UCSC Genome Browser provides browsers for more than 180 assemblies and over 100 species. It provides a collection of tools to explore genomes and conduct analyses including a data integrator to merge and export data from multiple tracks. The platform aims to develop mechanisms for mapping annotations from the reference assembly to the corresponding patch and haplotype regions.
Provides a bioinformatics framework to organise biology around the sequences of large genomes. Ensembl is a comprehensive source of stable automatic annotation of genome sequences, available as either an interactive website or as flat files. It can integrates manually annotated gene structures from external sources where are available. This resource includes access to all of services and documentation, including the REST API and BioMart.
Gathers information about transposable elements (TEs) and other types of repeats in eukaryotic genomes. Repbase is an online database that can be used for eukaryotic genome sequence analyses and in studies concerning the evolution of TEs and their impact on genomes. This repository contains more than 38,000 sequences of different families or subfamilies.
Supplies several online resources for biological information. NCBI is a web-based platform gathering information, tools, and functions that can be useful for researchers about biology. It offers user to: (1) submit data or manuscripts in the NCBI database; (2) download data from NCBI database; (3) search scholar documents or projects; (4) build application with the help of NCBI APIs and code; and (5) find tool to analyze user data.
Gives access to genome sequences and annotations, and allows exploration of genomic data. JGI Genome Portal furnishes worldwide statistics on the usage of the JGI resources and the information about the latest genome releases and new tool development. It can automatically generate and monitor BioSample and BioProject submissions to NCBI. This database permits users to access to other resources such as the Genomes OnLine Database (GOLD).
A scientific database for the bacterium Escherichia coli K-12 MG1655. The EcoCyc project performs literature-based curation of the entire genome, and of transcriptional regulation, transporters, and metabolic pathways. New experimental discoveries about gene products, their function and regulation, new metabolic pathways, enzymes and cofactors are regularly added to EcoCyc. SmartTable tools allow users to browse collections of related EcoCyc content. SmartTables can also serve as repositories for user- or curator-generated lists. EcoCyc supports running and modifying E. coli metabolic models directly on the EcoCyc website.
Provides aligned and annotated ribosomal RNA (rRNA) gene sequence data source, along with tools to allow researchers to analyze their own rRNA gene sequences. RDP offers tools for browsing and searching the data collections, for taxonomic classification and nearest neighbor search, for primer-probe testing and for tree building. RDP data and tools are utilized in fields as diverse as human health, microbial ecology, environmental microbiology, nucleic acid chemistry, taxonomy and phylogenetics.
Provides a database of functional genomics experiments. ArrayExpress includes data generated by sequencing or array-based technologies. This resource integrates the Gene Expression Atlas and the sequence databases at the European Bioinformatics Institute. Advanced queries provided via ontology enabled interfaces include queries based on technology and sample attributes such as disease, cell types and anatomy.
A database which offers gene annotation of Ricinus communis, also known as Castorbean. The genome sequence assembly was searched for repetitive DNA using a combination of sequence alignment to databases of repetitive sequences and RepeatScout to identify repeats de novo. Overall, over 50% of the genome was identified as repetitive DNA (excluding low-complexity sequences), most of which could not be associated with known element families. Ricinus communis belongs to the Euphorbiaceae family.
Provides clusters of orthologous groups (COGs) and updated annotation of those COGs. COGs is a database where organisms are sorted according to the NCBI Taxonomy database. Each gene entry in a COG is now denoted by its gene index (gi) number in the NCBI protein database and is linked to the respective entry in the NCBI’s RefSeq database. It concentrates on prokaryotes (bacteria and archaea).
A light weight comprehensive genome resource and sequence analysis platform for oomycete organisms. EuMicrobedbLite is a successor of the VBI Microbial Database (VMD) that was built using the Genome Unified Schema (GUS). This database has 26 publicly available genomes and 10 EST datasets of oomycete organisms. The browser page has dynamic tracks presenting comparative genomics analyses, coding and non-coding data, tRNA genes, repeats and EST alignments. In addition, 44777 core conserved proteins were defined from twelve oomycete organisms that form 2974 clusters. The user interface has undergone major changes for ease of browsing. Queryable comparative genomics information, conserved orthologous genes and pathways are among the new key features updated in this database. Annotations for the organisms are updated once every six months to ensure quality.
Offers annotation for over 95 000 genomes. RefSeq assigns informative names to genes, provides some annotation for every gene found in each genome it analyzes, and supports comparative studies by using consistent structural and functional annotation methods. This database uses tailored data models and processes flows to deliver reference collections for eukaryotes, viruses and prokaryotes.
Provides a resource for data analysis and visualization in a gene-by-gene or genome-wide scale. PlasmoDB is a functional genomic database for Plasmodium spp. It belongs to a family of genomic resources that are housed under the EuPathDB Bioinformatics Resource Center (BRC) umbrella. Data in PlasmoDB can be queried by selecting the data of interest from a query grid or drop down menus. Various results can then be combined with each other on the query history page.
Offers assembly and gene annotation of Yersinia pestis, which is in the Enterobacteriaceae family. Yersiniae consist of 11 species that have been traditionally distinguished by DNA-DNA hybridisation and biochemical analyses. The database generates reference genomes for two of the human pathogenic Yersinia: Y. pestis and Y. enterocolitica. The genome of Y. pestis is punctuated with pseudogenes demonstrating that despite its high virulence Y. pestis is in the early stage of genome decay, eliminating genes no longer required outside it mammalian host.
Offers assembly and gene annotation of Hydra magnipapillata, which is in the family Hydridae. The Hydra genome has been shaped by bursts of transposable element expansion, horizontal gene transfer, trans-splicing, and simplification of gene structure and gene content that parallel simplification of the Hydra life cycle. The database reports the sequence of the genome of a novel bacterium stably associated with H. magnipapillata.
Offers assembly and gene annotation of Pleurobrachia bachei, which is in the Pleurobrachiidae family. The database sequences the Pleurobrachia bachei genome and identifies ~19,600 gene models, 96% of which are supported by transcriptome data. The Pleurobrachia bachei draft genome was assembled using a custom approach designed to leverage the individual strengths of three popular de novo assembly packages and strategies: Velvet, SOAPdenovo, and pseudo-454 hybrid assembly with ABySS.