Encrypts a genetic sequence of interest and permits users to search databases in a confidential manner. SIG-DB confronts the encrypted sequence to each item in the database chosen and computes an encrypted similarity score. It allows users to make private sequence-to-sequence comparisons. This tool furnishes a solution for a secure multi-party exchange of information. It is composed of two distinct parts: a Querier and a Database Owner.
Searches protein database using a translated nucleotide query. BLASTX is a BLAST search application that compares the six-frame conceptual translation products of a nucleotide query sequence (both strands) against a protein sequence database. This application can also work in Blast2Sequences mode and can send BLAST searches over the network to public NCBI server if desired.
Allows to find regions of sequence similarity. PSI-BLAST is a protein database search program. The software is able to access the probable substitutions at each sequence position using the results of a previous Gapped-Blast search, an algorithm comparing the amino acid substitution matrix. It can combine search results with robust statistics to build and apply profiles also known as a position-specific scoring matrix. A modified application of PSI-BLAST - PSI-BLASTexB - that solves sequence weighting scheme limitations, was also developed.
Permits functional annotation, management, and data mining of novel sequence data. Blast2GO is based on the utilization of common controlled vocabulary schemas, the gene ontology (GO). It takes in consideration similarity, the extension of the homology, the database of choice, the GO hierarchy, and the quality of the original annotations. This tool is suitable for plant genomics research. It generates functional annotation and assesses the functional meaning of their experimental results.
Allows for the design of CRISPR guide RNA libraries that can be used to edit coding and noncoding genomic regions. GuideScan produces high-density sets of guide RNAs (gRNAs) for single- and paired-gRNA genome-wide screens. Rather than using an alignment tool, GuideScan uses a retrieval tree (trie) data structure, which efficiently and precisely enumerates all targetable sequences present in a given genome. Traversals of the trie allow for the computation of sequence mismatch neighborhoods, which are used to construct databases of gRNAs whose target sites are unique in the genome up to a user-defined number of mismatches. The GuideScan website allows users to input coordinates of genomic features in batch, to choose between designing single internal gRNAs or pairs of flanking gRNAs, and retrieve for each genomic coordinate a pre-defined number of gRNAs or gRNA pairs.
Aligns the identified regions at the gene cluster level to their nearest relatives from a database containing all other known gene clusters. antiSMASH facilitates the mining of bacterial and fungal genomes. It includes gene cluster boundary prediction for fungal biosynthetic gene clusters (BGCs), improved chemistry predictions for terpene, ribosomal peptide and non-ribosomal peptide BGCs, comparative alignment of trans-AT polyketide synthase (PKS) assembly lines and TTA codon annotation. A user interface was also introduced.
A computational approach to predict the functional composition of a metagenome using marker gene data and a database of reference genomes. PICRUSt uses an extended ancestral-state reconstruction algorithm to predict which gene families are present and then combines gene families to estimate the composite metagenome. Using 16S information, PICRUSt recaptures key findings from the Human Microbiome Project and accurately predicts the abundance of gene families in host-associated and environmental communities, with quantifiable uncertainty.
Integrates BioMart data resources with data analysis software in Bioconductor. BiomaRt can annotate a wide range of gene or gene product identifiers with information such as gene symbol, chromosomal coordinates, Gene Ontology and Online Mendelian Inheritance in Man (OMIM) annotation. Furthermore, biomaRt enables retrieval of genomic sequences and single nucleotide polymorphism information, which can be used in data analysis. Fast and up-to-date data retrieval is possible as the package executes direct SQL queries to the BioMart databases. The biomaRt package provides a tight integration of large, public or locally installed BioMart databases with data analysis in Bioconductor creating a powerful environment for biological data mining.
Provides a generalized linear model for functional genomic data and genome annotations. LINSIGHT is a computational method that outperforms state-of-the-art prediction methods in the task of prioritizing noncoding disease variants from the Human Gene Mutation database (HGMD) and the National Center for Biotechnology Information (NCBI) ClinVar database. By integrating a large number of genomic features, LINSIGHT provides a precise, high-resolution description of the fitness consequences of noncoding mutations in human genome.
Allows management, analysis, simulation and visualization of integrated collections of genome, pathway and regulatory data. Pathway Tools is a bioinformatics software environment around a type of model-organism database called Pathway/Genome Database (PGDB). The software can manipulate genome data, metabolic networks and regulatory networks. For each datatype, it provides query, visualization, editing and analysis functions. It also provides visual tools for analysis of omics data sets, and tools for the analysis of biological networks.
Permits ‘genecentric’ annotation of the human genome for laboratory and analytical work carried out at the Core Genotyping Facility (CGF) of the National Cancer Institute. Genewindow integrates data available in the public databases with internal annotations from sequence data generated by our laboratory. It is configured for the human genome and can be applied to other genomes and integrated with the analysis, storage and archiving of data generated in any laboratory setting.
Calculates in silico the extent of identity between two genomes. JSpeciesWS is able to determine overall genome relatedness indices (OGRI). It allows rapid comparisons against the reference database offered by the tool, providing a list of the most similar genomes based on their resulting Tetra-nucleotide signature correlation index. This database is composed of NCBI’s genomic sequence data and includes all primary submissions of assembled genome sequences and their associated annotation data.
Simplifies the annotation of genetic variants in VCF format. Vcfanno can extract and summarize multiple attributes from one or more annotation files and append the resulting annotations to the INFO field of the original VCF file. Vcfanno also integrates the lua scripting language so that users can easily develop custom annotations and metrics. It represents a substantial improvement over existing methods, enabling rapid annotation of whole-genome and whole-exome datasets and provides substantial analytical power to studies of disease, population genetics, and evolution.
A comparative tool for analyzing the regulatory potential of noncoding sequences. Our ability to experimentally identify functional noncoding sequences is extremely limited, therefore, rVISTA attempts to fill this great gap in genomic analysis by offering a powerful approach for eliminating TFBSs least likely to be biologically relevant. rVISTA analysis proceeds in four main steps: (i) detect TFBS matches in each individual sequence using PWMs from the TRANSFAC database, (ii) identify pairs of locally aligned TFBSs, (iii) select TFBSs present in regions of high DNA conservation and (iv) create a graphical display that dynamically overlays individual or clustered TFBSs with the conservation profile of the genomic locus. The rVISTA web server is closely interconnected with the TRANSFAC database, allowing users to either search for matrices present in the TRANSFAC library collection or search for user-defined consensus sequences.
Finds putative bacteriocin open reading frames (ORFs) in a DNA sequence. BAGEL uses knowledge-based bacteriocin databases and motif databases to make the identification. It combines direct and indirect mining by looking at context genes. This tool integrates RNASeq data, promoter and terminator predictions. It can investigate the sequence of the surrounding region on the genome for genes that might encode proteins involved in biosynthesis, transport, regulation and/or immunity.
Allows clustering and searching of large protein datasets, such as UniProt, or 6-frame translated metagenomics sequencing reads. MMseqs is a software suite which contains three core modules: a pre-filtering module, an SSE2- and multi-core-parallelized local alignment module, and a clustering module. In addition to the modules, three workflows for sequence searching, clustering, and updating a clustering intends to facilitate the most common tasks for the non-expert.
Predicts prokaryotic species based on the number of overlapping (co-occurring) k-mers between the query genome and genomes in a reference database. KmerFinderJS, an implementation of KmerFinder, is a method allowing to identify bacterial species in whole genome data. The software uses Redis, a centralised in-memory database, to store the reference database.
Allows data-mining and visualization of next-generation sequencing (NGS) samples such as enrichment patterns of DNA-interacting proteins at functional genomic regions. ngs.plot has a built-in database of functional elements that facilitates the management of genomic coordinates for users. This software supports large sequencing data and is available through the Galaxy tool shed.
A method that uses a subset of marker genes (MGs) for taxonomic profiling of metagenomes. mOTU is available as a standalone software and is also implemented in MOCAT. Species-level profiles are generated by mapping reads from metagenomes to a database (mOTU.v1.padded) consisting of 10 MGs extracted from 3,496 prokaryotic reference genomes (downloaded from NCBI) and 263 publicly available metagenomes (from the MetaHIT and HMP projects).
Includes several types of information on Drosophila genes and genomes. FlyBase is an online database that curates a variety of data from published biological literature, including phenotype, gene expression, interactions (genetic and physical), gene ontology (GO) information and many others. Moreover, data are organized in 31 different data-type reports such as the Gene Report or the Allele Report.
Provides publicly available nucleotide sequences for formally described species. GenBank is a comprehensive public database of nucleotide sequences. It also supports bibliographic and biological annotations. The sequences are obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole genome shotgun and environmental sampling projects.
Offers an online catalog of human genes and genetic disorders. OMIM is a database human genes and genetic disorders and traits. It focuses on the molecular relationship between genetic variation and phenotypic expression. Moreover, it is based on the published peer-reviewed biomedical literature and is used by overlapping and diverse communities of clinicians, molecular biologists and genome scientists.
Generates, analyzes, and makes available genomic sequence, expression, methylation, and copy number variation (CNV) data on over 11,000 individuals who represent over 30 different types of cancer. The information generated by TCGA is centrally managed and entered into databases as it becomes available, making the data rapidly accessible to the entire research community. TCGA is a collaborative effort led by the National Cancer Institute and the National Human Genome Research Institute to map the genomic and epigenomic changes that occur in types of human cancer, including nine rare tumors. Its goal is to support new discoveries through the generation of a catalog of somatic aberrations occurring in the different neoplasms, and accelerate the pace of research aimed at improving the diagnosis, treatment, and prevention of cancer.
Provides high-throughput microarray and next-generation sequence (NGS) functional genomic data sets. GEO archives raw data, processed data and metadata submitted by the research community. Its data are indexed, cross-linked and searchable. This database gives access to several tools and graphical renderings allowing users to easily explore and interpret data available on the platform. It can be useful to develop and test new hypotheses.
Displays assembled human and other mammalian genomes. UCSC Genome Browser provides browsers for more than 180 assemblies and over 100 species. It provides a collection of tools to explore genomes and conduct analyses including a data integrator to merge and export data from multiple tracks. The platform aims to develop mechanisms for mapping annotations from the reference assembly to the corresponding patch and haplotype regions.
Provides a bioinformatics framework to organise biology around the sequences of large genomes. Ensembl is a comprehensive source of stable automatic annotation of genome sequences, available as either an interactive website or as flat files. It can integrates manually annotated gene structures from external sources where are available. This resource includes access to all of services and documentation, including the REST API and BioMart.
Gathers information about transposable elements (TEs) and other types of repeats in eukaryotic genomes. Repbase is an online database that can be used for eukaryotic genome sequence analyses and in studies concerning the evolution of TEs and their impact on genomes. This repository contains more than 38,000 sequences of different families or subfamilies.
Supplies several online resources for biological information. NCBI is a web-based platform gathering information, tools, and functions that can be useful for researchers about biology. It offers user to: (1) submit data or manuscripts in the NCBI database; (2) download data from NCBI database; (3) search scholar documents or projects; (4) build application with the help of NCBI APIs and code; and (5) find tool to analyze user data.
Hosts experimental data for Escherichia coli K-12. The EcoCyc project performs literature-based curation of the entire genome, and of transcriptional regulation, transporters, and metabolic pathways. It is an online database that can serve for the E. coli research community and provides a way to find and compare orthologous genes and metabolic pathways across a wide spectrum of organisms.
Gives access to genome sequences and annotations, and allows exploration of genomic data. JGI Genome Portal furnishes worldwide statistics on the usage of the JGI resources and the information about the latest genome releases and new tool development. It can automatically generate and monitor BioSample and BioProject submissions to NCBI. This database permits users to access to other resources such as the Genomes OnLine Database (GOLD).
Provides clusters of orthologous groups (COGs) and updated annotation of those COGs. COGs is a database where organisms are sorted according to the NCBI Taxonomy database. Each gene entry in a COG is now denoted by its gene index (gi) number in the NCBI protein database and is linked to the respective entry in the NCBI’s RefSeq database. It concentrates on prokaryotes (bacteria and archaea).
Provides a database of functional genomics experiments. ArrayExpress includes data generated by sequencing or array-based technologies. This resource integrates the Gene Expression Atlas and the sequence databases at the European Bioinformatics Institute. Advanced queries provided via ontology enabled interfaces include queries based on technology and sample attributes such as disease, cell types and anatomy.
Offers gene annotation of Ricinus communis also known as Castorbean. Ricinus communis is an online database that informs on genome sequence assembly. Over 50% of the genome was identified as repetitive DNA (excluding low-complexity sequences), most of which could not be associated with known element families. Moreover, users can obtain information online in interrogating the database, or locally in downloading data from database.
Provides aligned and annotated ribosomal RNA (rRNA) gene sequence data source, along with tools to allow researchers to analyze their own rRNA gene sequences. RDP offers tools for browsing and searching the data collections, for taxonomic classification and nearest neighbor search, for primer-probe testing and for tree building. RDP data and tools are utilized in fields as diverse as human health, microbial ecology, environmental microbiology, nucleic acid chemistry, taxonomy and phylogenetics.
Allows gene annotation of Carica papaya. Carica papaya is an online repository assisting users to annotate the Carica papaya genome using the yrGATE gene structure annotation tool. Moreover, this database offers users different functionalities: genome/gene models, alignment to genome, search/download, annotated protein alignments, or custom track display.
Offers annotation for over 95 000 genomes. RefSeq assigns informative names to genes, provides some annotation for every gene found in each genome it analyzes, and supports comparative studies by using consistent structural and functional annotation methods. This database uses tailored data models and processes flows to deliver reference collections for eukaryotes, viruses and prokaryotes.
Provides a resource for data analysis and visualization in a gene-by-gene or genome-wide scale. PlasmoDB is a functional genomic database for Plasmodium spp. It belongs to a family of genomic resources that are housed under the EuPathDB Bioinformatics Resource Center (BRC) umbrella. Data in PlasmoDB can be queried by selecting the data of interest from a query grid or drop down menus. Various results can then be combined with each other on the query history page.
Expands in data type and population diversity the resources from the 1000 Genomes Project. IGSR represents the largest open collection of human variation data and provides easy access to these resources. IGSR was established in 2015 to maintain and extend the 1000 Genomes Project data, which has been widely used as a reference set of human variation and by researchers developing analysis methods. IGSR has mapped all of the 1000 Genomes sequence to the newest human reference (GRCh38), and will release updated variant calls to ensure maximal usefulness of the existing data. IGSR is collecting new structural variation data on the 1000 Genomes samples from long read sequencing and other technologies, and will collect relevant functional data into a single comprehensive resource. IGSR is extending coverage with new populations sequenced by collaborating groups.
Permits users to annotate Anolis carolinensis also known as Carolina anole an arboreal lizard. Anolis carolinensis is an online database containing a search engine assisting researchers to find information on Anole lizard. Data present on this platform come from the genome sequencing platform, the genome assembly team, broad institute of MIT and Harvard.