Resources Analytics Protocols arrow_drop_down
A database containing genes with frameshifts (fs-genes) predicted by GeneTack. The database includes 206 991 fs-genes from 1106 complete prokaryotic genomes and 45 295 frameshifts predicted in mRNA sequences from 100 eukaryotic genomes. The whole set of fs-genes was grouped into clusters based on sequence similarity between fs-proteins (conceptually translated fs-genes), conservation of the frameshift position and frameshift direction (−1, +1). The fs-genes can be retrieved by similarity search to a given query sequence via a web interface, by fs-gene cluster browsing, etc. Clusters of fs-genes are characterized with respect to their likely origin, such as pseudogenization, phase variation, etc. The largest clusters contain fs-genes with programed frameshifts (related to recoding events).
GTR / Genetic Testing Registry
Centralizes comprehensive information about genetic tests. GTR is a free online resource that organizes critical information such as the purpose of the test, target population, test methods, what the test measures, analytical validity, clinical validity, clinical utility, ordering information and test credentials as well as the laboratory name, location, contact information, certifications and licenses.
Detects automatically homologs among the genes of 21 completely sequenced eukaryotic genomes. HomoloGene is an automated system for constructing putative homology groups from the complete gene sets of a wide range of eukaryotic species. Reports include homology and phenotype information drawn from Online Mendelian Inheritance in Man, Mouse Genome Informatics, Zebrafish Information Network, Saccharomyces Genome Database and FlyBase.
IMEx / International Molecular Exchange
Stores a non-redundant set of protein interactions. IMEx is a consortium that made a data resource, which enables user to download, combine, visualize and analyze data in a single format from multiple resources. The database collates experimental evidences from any species for which interaction data is available. It is curated from direct submissions or peer-reviewed journals.
Provides information about human disorders that have a genetic component. MedGen organizes terms from multiple sources by assigning them a concept ID, and then adds value by reporting practice guidelines, or related genes from NCBI’s Gene database. By reporting disorders, findings, clinical features and drugs, this resource supports querying for disorders that share clinical features and drugs and their responses.
VISTA Enhancer Browser
A central resource for experimentally validated human and mouse noncoding fragments with gene enhancer activity as assessed in transgenic mice. The purpose of VISTA Enhancer Browser is to facilitate public access to a large and consistent dataset of such enhancers both for experimental and computational biologists. Candidate noncoding regions for experimental testing are identified based on their conservation between the human and other vertebrate genomes.
Membrane protein function and stability has been shown to be dependent on the lipid environment. CGDB is a database of membrane protein/lipid interactions by coarse-grained molecular dynamics simulations.
Provides a database of potassium (K+) ion channel homology models and molecular dynamics simulations. KDB is an online resource in which K+ channels are separated into 2 categories: (i) homology models and crystal structures and (ii) molecular dynamics simulations.
Provides access to a list of membrane protein structures.
A database of ß-barrel outer membrane proteins from Gram-negative bacteria. Information included in OMPdb consists of sequence data, as well as annotation for structural characteristics (such as the transmembrane segments), literature references and links to other public databases, features that are unique worldwide. OMPdb is useful for genome-wide analyses, comparative genomics as well as for providing training and test sets for predictive algorithms regarding transmembrane β-barrels.
OPM / Orientations of Proteins in Membranes
Provides spatial positions of membrane-bound peptides and proteins of known 3D structure in the lipid bilayer. OPM is a curated web resource which contains more than 1200 transmembrane and peripheral proteins and peptides from approximately 350 organisms that represent approximately 3800 Protein Data Bank (PDB) entries. The database also indicates structural classification, topology and intracellular localization.
PDBTM / Protein Data Bank of Transmembrane Proteins
A collection of transmembrane proteins with known structures automatically updated. PDB_TM is a database using only structural information to locate the most position of the lipid bilayer and to distinguish transmembrane from globular proteins. It also available to study any particular transmembrane protein and to find for example a binding site located in the lipid or in the aqueous phase.
A comprehensive database that gathers all prediction outputs concerning complete prokaryotic proteomes. CoBaltDB is a powerful platform that provides easy access to the results of multiple localization tools and support for predicting prokaryotic protein localizations with higher confidence than previously possible.
Human DNA Polymerase Gamma Mutation Database
This database lists all known mutations in the coding region of the POLG gene and describes the associated disease.
Reports published and unpublished data on human mitochondrial DNA variation. MITOMASTER gives instructions showing how to submit sequences to identify nucleotide variants relative to the rCRS, to determine the haplogroup, and to view species conservation. User-supplied sequences, GenBank sequences and single nucleotide variants may be analyzed. MITOMAP consists of three main sections: i) background information about the human mitochondrial DNA; ii) an annotated listing of mtDNA variants, both general population and patient; and iii) the MITOMASTER analysis tool.
mtDB / Human Mitochondrial Genome Database
A web-based database of human whole genome and complete coding region sequences. mtDB is the only comprehensive online source for the data contained within it. This includes the sequences themselves as many have not been deposited in a publicly available database such as GenBank. The list of mitochondrial polymorphisms continually grows with the addition of new sequences and is an important resource for phylogenetic and medical studies. The ability to search for multiple-variant haplotypes adds further detail to the latent data.
Provides a phylogenetic tree of global human mitochondrial DNA (mtDNA) variation, based on both coding- and control-region mutations, and including haplogroup nomenclature. is a database which gives access to references to consulted papers as well as to accession numbers of underlying NCBI GenBank sequences. The database is meant as a framework for scientists interested in the description and application of human mtDNA diversity.
DDBJ / DNA Data Bank of Japan
Provides public archival, retrieval and analytical services for biological information. DDBJ furnishes an analytical environment for domestic researchers to examine large-scale biology data. It offers access to a large collection of databases covering the archiving of sequences with functional annotation and molecular abundance. This platform allows data integration and sharing in collaboration with the Database Center for Life Science (DBCLS) in Japan.
Hosts experimental data for Escherichia coli K-12. The EcoCyc project performs literature-based curation of the entire genome, and of transcriptional regulation, transporters, and metabolic pathways. It is an online database that can serve for the E. coli research community and provides a way to find and compare orthologous genes and metabolic pathways across a wide spectrum of organisms.
Provides a bioinformatics framework to organize biology around the sequences of large genomes. Ensembl provides stable automatic annotation of genome sequences, available as either an interactive website or as flat files. It can integrates manually annotated gene structures from external sources where are available. This resource includes access to all of services and documentation, including the REST API and BioMart.
Ensembl Genomes
An integrating resource for genome-scale data from non-vertebrate species.
Simulates Illumina reads using empirical profiles. pIRS is a simulator developed to reproduce similar to those generated from the Illumina platform. This method is helpful for developing next-generation sequencing (NGS) software such as de novo assembly, single-nucleotide polymorphism (SNP) calling and structural variation detection. This application can also be useful for applications that need heterozygous data.
Provides publicly available nucleotide sequences for formally described species. GenBank is a comprehensive public database of nucleotide sequences. It also supports bibliographic and biological annotations. The sequences are obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole genome shotgun and environmental sampling projects.
Gathers gene-specific information from several sources such as Gene Ontology (GO). Gene compiles genomes that are completely represented by whole genome shotgun (WGS) assemblies with a unique GeneID. The database contains sequences from several different taxonomic identifiers including viruses, bacteria or eukaryotes. Searches can be made by simple text queries, or by advanced filtered searches in a specific field.
INSDC / International Nucleotide Sequence Database Collaboration
Prompts researchers to make initiatives in public domain data sharing. INSDC is an online repository that consists of three nodes: DNA Data Bank of Japan (DDBJ); European Nucleotide Archive (ENA); and GenBank. It regroups information including raw sequence reads and alignments in the read archives (SRA) and assembled sequences with functional annotation in the traditional archives.
MaizeGDB / Maize Genetics and Genomics Database
Provides several types of information about corn. MaizeGDB is an online repository offering several functions: genome browser, or bin viewer. It also proposes different tools allowing users to work on Zea mays such as: (1) SNPversity that permits researchers to compare single nucleotide polymorphisms (SNPs); (2) a BLAST tool assisting users to BLAST datasets at several sites. A “data centers” page supplies a lot of filters to simplify user’s searches.
MGI / Mouse Genome Informatics
Provides access to integrated genetic, genomic, and biological data about the laboratory mouse. MGI aims to facilitate study of human health and disease. It includes several topic areas: genes; phenotypes and mutant alleles; human-mouse: disease connection; recombinase; function; strains; strains, single nucleotide polymorphisms and polymorphisms; vertebrate homology; pathways; batch data and analysis tools; and nomenclature. The database contains the Mouse Genome Database (MGD), the Gene Expression Database (GXD) and the Mouse Tumor Biology (MTB) database.
Gives access to genomics data and allows their analysis using web-based analytical tools. MycoCosm is an integrated fungal genomics resource that hosts more than 250 publicly available fungal genomes. Data can be searched using keywords or sequence, investigated using analytical tools, and downloaded for custom analyses. The database enables in-depth multidimensional analysis of individual genomes and efficient comparative genomics of fungi, which may be applied to phylogenetically related fungi, or to those sharing the same lifestyle, ecological niche or phenotypic trait.
PATRIC / Pathosystems Resource Integration Center
Aims to assist scientists in infectious-disease research. PATRIC is a National Institute of Health (NIH) supported bioinformatics resource center that has been built to enable comparative genomic analysis of bacterial pathogens. The database provides researchers with an online resource that stores and integrates a variety of data types (e.g. genomics, transcriptomics, protein-protein interactions (PPIs), three-dimensional protein structures and sequence typing data) and associated metadata. Tools and services for bacterial infectious disease research are also available.
RefSeq / Reference Sequence
Offers annotation for over 95 000 genomes. RefSeq assigns informative names to genes, provides some annotation for every gene found in each genome it analyzes, and supports comparative studies by using consistent structural and functional annotation methods. This database uses tailored data models and processes flows to deliver reference collections for eukaryotes, viruses and prokaryotes.
RGD / Rat Genome Database
Provides a comprehensive data repository and informatics platform related to the laboratory rat, one of the most important model organisms for disease studies. Rat Genome Database (RGD) maintains and updates datasets for genomic elements such as genes, transcripts and increasingly in recent years, sequence variations, as well as map positions for multiple assemblies and sequence information. Functional annotations for genomic elements are curated from published literature, submitted by researchers and integrated from other public resources. Complementing the genomic data catalogs are those associated with phenotypes and disease, including strains, quantitative trait loci (QTL) and experimental phenotype measurements across hundreds of strains.
SGD / Saccharomyces Genome Database
Compiles comprehensive integrated biological information about the budding yeast Saccharomyces cerevisiae. SGD is a manually-curated database which aims to improve the discovery of functional relationships between sequence and gene products in fungi and higher organisms. The database records information about the yeast genome and its genes, proteins, and other encoded features. Moreover, it contains several bioinformatic tools to facilitate experimental design and analysis.
TAIR / The Arabidopsis Information Resource
Maintains a database of genetic and molecular biology data for the model higher plant Arabidopsis thaliana. Data available from TAIR includes the complete genome sequence along with gene structure, gene product information, gene expression, DNA and seed stocks, genome maps, genetic and physical markers, publications, and information about the Arabidopsis research community.
NCBI UniGene
Identifies transcripts from the same locus; analyzes expression by age, tissue and health status; and clone resources and reports related proteins (protEST).
Compiles data about the biology, genetics and genomics of Caenorhabditis elegans and additional nematodes. WormBase contains the complete lineages for the male and hermaphrodite organisms and information that describes each cell and its primary biological function. It has multiple alternatives to represent genomic information including a graphical display. Besides, users have the possibility to build a customized library with their genes of interest.
Integrates variety of data from biomedical model genus and archives large data sets that would otherwise be unavailable to the global scientific community. Researchers can easily navigate from genome content to gene page reports, literature, experimental reagents and many other features using hyperlinks. Xenbase has been designed to facilitate direct comparison of information from Xenopus genes with their human orthologs. Search returns will show all gene pages on which the OMIM entry is linked, and all other OMIM disease associations for that gene.
ZFIN / The Zebrafish Information Network
Provides genetic and genomic data involving zebrafish. ZFIN is composed of mutants, gene expression, phenotypes, knockdown reagents, antibodies, transgenic constructs, and reporter lines. It offers detailed observation of gene expression patterns in wild-type and mutant fish embryos and larvae. The database aims to assist members of the research community to find data of interest and generate new hypotheses.
CDP database / Cell Death Proteomics Database
Comprehends data from apoptosis, autophagy, cytotoxic granule-mediated cell death, excitotoxicity, mitotic catastrophe, paraptosis, pyroptosis, and Wallerian degeneration. The CDP database represents a useful tool to consolidate data from proteome analyses of programmed cell death.
CAT / CAZymes Analysis Toolkit
Allows users to access to a repository of tools for analysis and annotation of carbohydrate active enzymes (CAZYmes). CAT extracts data content of each CAZymes family from HTML pages through a GET request. This database offers a set of enzymes with manually annotations that modify, degrade, or create glycosidic bonds. It can be used to predict CAZymes submissions of the P. trichocarpa genome and in the E. coli K12-MG1655 genome.
CAZy / Carbohydrate-Active Enzymes database
Describes the families of structurally-related catalytic and carbohydrate-binding modules of enzymes that degrade, modify or create glycosidic bonds. CAZY is a knowledge-based resource that aims to link the sequence, the specificity and the 3D structural features of CAZymes, the Carbohydrate-Active enZymes. The CAZomes listed in the CAZy website correspond to protein models of finished genomes, i.e. with proteins released in the daily releases of GenBank.
COGs / the Clusters of Orthologous Groups of proteins
Provides clusters of orthologous groups (COGs) and updated annotation of those COGs. COGs is a database where organisms are sorted according to the NCBI Taxonomy database. Each gene entry in a COG is now denoted by its gene index (gi) number in the NCBI protein database and is linked to the respective entry in the NCBI’s RefSeq database. It concentrates on prokaryotes (bacteria and archaea).
eggNOG / evolutionary genealogy of genes: Non-supervised Orthologous Groups
Provides orthologous groups (OGs) of proteins at different taxonomic levels. eggNOG is a database dedicated to orthologous groups and functional annotation. It provides pairwise orthology relationships within OGs based on analysis of phylogenetic trees. This tool also contains a framework for mapping novel sequences to OGs based on precomputed hidden Markov models (HMM) profiles.
Provides orthologs, genes inherited by extant species from a single gene in their last common ancestor. OrthoDB offers a pipeline dedicated to the delineation of orthologs and that leans on assessments of pairwise gene homology between complete genomes and their subsequent clustering. It contains more than 37 million of genes, 6 000 prokaryotes and over 1 200 eukaryotes.
Provides a set of multiple sequence alignments and hidden Markov models (HMMs) for protein families. Pfam is constructed by capturing the diversity of a set of evolutionarily related sequences. It aligns a representative subset of the entire set of matching sequences to build the seed alignment. This database provides more than 17000 entries which are related by similarity of sequence, structure or profile-HMM.
A comprehensive set of protein domain families automatically generated from the UniProt Knowledge Database.
Provides a motif descriptor database. PROSITE offers an annotated collection of biologically meaningful motif descriptors dedicated to the identification of protein families and domains. This database uses two kinds of motif descriptors: (i) patterns or regular expressions in which the most significant residue information is discarded, and (ii) generalized profiles and quantitative motif descriptors that consider the overall similarity on the entire length of domains or proteins.
A resource consisting of curated multiple sequence alignments, Hidden Markov Models (HMMs) for protein sequence classification, and associated information designed to support automated annotation of (mostly prokaryotic) proteins.
A freely editable semantic wiki for community-based curation of the terms used in Neuroscience.
HTS Mappers
This page attempts to provide an up-to-date compendium of HTS mappers initially provided in the article "Tools for mapping high-throughput sequencing data".
The NGS WikiBook
A collaborative next-generation sequencing (NGS) resource. Users can search, browse, edit and create new content, so as to facilitate self-learning and feedback to the community. The overall structure and style for this dynamic material is designed for the bench biologists and non-bioinformaticians. The flexibility of online material allows the readers to ignore details in a first read, yet have immediate access to the information they need. Each chapter comes with practical exercises so readers may familiarize themselves with each step. The NGS WikiBook aims to create a collective laboratory book and protocol that explains the key concepts and describes best practices in this fast-evolving field.
CMS / Cancer Methylome System
A web-based database application designed for the visualization, comparison and statistical analysis of human cancer-specific DNA methylation. CMS provides visualization and analytic functions for cancer methylome datasets. A comprehensive collection of datasets, a variety of embedded analytic functions and extensive applications with biological and translational significance make this system powerful and unique in cancer methylation research.
DBCAT / DataBase of CpG islands and Analytical Tool
A database developed in order to recognize comprehensive mehtylation profiles of DNA alteration in human cancer. DBCAT is an online methylation analytical tool composed of three parts: a CpG Island Finder, a genome query browser and an analytical tool for methylation microarray data. The analytical tool can analyze raw data generated from scanners and search genes with methylated regions which could affect gene expression regulation. DBCAT not only identifies the regions of methylation but also searches the database to pick up genes with methylated regions of functional meanings.
Presents the most complete collection and annotation of aberrant DNA methylation in human diseases, especially various cancers. DiseaseMeth is focused not only on curated information about diseases, genes and corresponding methylation data, but also on predicted associations between diseases of interest and methylation of specific DNA regions based on the vast amounts of data that it contains. DiseaseMeth contains methylation data of 32701 samples from 88 diseases together with 679602 associations between diseases and methylation of genes. DiseaseMeth not only enlarges the data of increased DNA methylation, but also provides new tools to explore the relationships between methylation of genes and diseases.
A database for histone mutations and their phenotypes. The database collects phenotypic screening data from assays of systematically constructed histone mutants: Single-residue substitutions, multiple substitutions, correlation with known post-translational modifications, cross-species mapping.
The purpose of this database is to provide the scientific community with a resource to store DNA methylation data and to make these data readily available to the public.
Allows the study of the interplay between DNA methylation, gene expression and cancer. MethyCancer contains: (i) CpG Island (CGI) clones and global CGI predictions, (ii) DNA methylation data, (iii) cancer information, genes and mutations and (iv) correlation among DNA methylation, gene expression and cancer. It provides users with a search engine to query different data types and data interactions, and offers keyword search, advanced searches, namely Methylation Search, Gene Search, Cancer Search, Clone Search and Repeat Search.
Includes genome-wide DNA methylation profiles for human and mouse brains. MethylomeDB offers an important resource for research into brain function and behavior. It provides the first source of comprehensive brain methylome data, encompassing whole-genome DNA methylation profiles of human and mouse brain specimens that facilitate cross-species comparative epigenomic investigations, as well as investigations of schizophrenia and depression methylomes.
Furnishes a collection of single-base whole-genome methylome maps for the best-assembled eukaryotic genomes. NGSmethDB is a database simplifying the analysis of methylation data from different sources. Heterogeneous methylation data can be either simultaneously visualized through a web interface or selectively downloaded by means of the provided data mining tools. It allows researchers to design new experiments and retrieve the adequate data for them.
PEpiD / Prostate Epigenetic Database
Stores the curated epigenetic data retrieved by literature mining, which previous studies indicated as involved in prostate cancer (PC) of human, mouse, and rat. A user-friendly interface is implemented for easy and flexible query. PEpiD can serve as an important resource for epigenetic research in PC.
NCBI Epigenomics
Explore, view, and download genome-wide maps of DNA and histone modifications from our diverse collection of epigenomic data sets. The Epigenomics resource also provides the user with a unique interface that allows for intuitive browsing and searching of data sets based on biological attributes.
Cistrome DB / Cistrome Data Browser
Provides an annotated knowledgebase of published or public ChIP-seq and DNase-seq data in mouse and human. Cistrome DB contains more than 2 500 ChIP-Seq datasets for transcription and chromatin regulators, over 2 000 histone modifications and variants, 400 DNase-Seq and about 1000 control datasets. It relies on the automatic parsing of sample metadata from data source.
Web resource, which is aimed to facilitate better hypothesis generation through knowledge syntheses mediated by better data integration and a user-friendly web interface. pfSNP integrates different algorithms/resources to interrogate thousand of SNPs from the dbSNP database for SNPs of potential functional significance based on previous published reports, inferred potential functionality from genetic approaches as well as predicted potential functionality from sequence motifs.
AnimalTFDB / Animal Transcription Factor DataBase
Gathers animal transcription factor (TF) lists, annotations, and provides prediction tools. AnimalTFDB is an animal TF database, which contains classification and annotation of genome-wide TFs and transcription cofactors in more than 90 animal genomes. The database provides annotations including gene phenotype and expression data in several species. TFs are classified into families, with one of them named “Others” including some orphan TFs. The prediction pipeline can be useful for TF identification in newly sequenced genome.
Holds conserved sequence motifs identified by genome scale motif discovery, similarity, clustering, co-occurrence and coexpression calculations.
Provides a set of hierarchical multi-layered concept of transcriptional regulation. CoryneRegNet consists of an ontology-based data warehouse. It employs a modular data processing pipeline that can recognize clusters of homologous proteins, match binding site motifs, determine operons and display special networks and graphs. This platform is useful for large-scale analysis of transcriptional regulation of gene expression in corynebacterial microorganisms.
DBTBS / database of transcriptional regulation in B. subtilis
Provides information about the Bacillus subtilis transcription system. DBTBS is composed of more than 100 binding factors and over 600 promoters of about 500 regulated genes. It can be used to demonstrate the presence or absence of potentially orthologous transcription factors and their corresponding cis-elements. This platform permits users to find the transcription factors that correspond to an inputted position-specific weighted matrix.
DBTSS / DataBase of Transcriptional Start Sites
Provides exact positions of transcriptional start sites (TSSs) in the genome. DBTSS was developed to facilitate the analyses regarding how germline variations or somatic mutations in cancers residing in transcriptional regulatory regions may affect the transcriptional regulation of their target genes in the diseased genome contexts. This resource also includes external epigenomic data.
YeTFaSCo / Yeast Transcription Factor Specificity Compendium
A collection of all available TF specificities for the yeast Saccharomyces cerevisiae in Position Frequency Matrix (PFM) or Position Weight Matrix (PWM) formats.
WebGeSTer DB / Web Genome scanner for terminators database
Informs user about sequenced bacterial genomes and plasmids. WebGeSTer DB consists of all types of intrinsic terminators identified in about 1000 bacterial chromosomes and more than 700 plasmids available at the NCBI database. This database provides user several whole-genome terminator maps.
Provides data of computationally predicted regulatory interactions within the genomes of several organisms of this group. Tractor_DB contains orthology relationships between gene pairs that are constructed with the bidirectional best hits (BBH) methodology. It permits the user to directly retrieve the information regarding the conservation of regulatory interactions within a given regulon from a map that contains all known Escherichia coli transcription factors (TFs) and the regulatory interactions that interconnect them.
Aims at classifying eukaryotic transcription factors (TFs) according to their DNA-binding domains (DBDs). For this, a classification schema comprising four generic levels (superclass, class, family and subfamily) was defined that could accommodate all known DNA-binding human TFs. TFClass is freely available through a web interface and for download in OBO format.
A database that facilitates the exploration of proteins involved in the regulation of transcription in humans.
STIFDB / Stress Responsive Transcription Factor Database
Provides a database of abiotic stress responsive genes. STIFDB is a resource that analyses promoters of abiotic stress responsive genes for potential stress-specific transcription factor binding sites. This resource can provide insights into the regulation of these stress responsive genes by upstream transcription factors. It also offers clues towards stress signal that affects the transcription of this gene, which might offer clarity about signal specific regulation.
An yeast-specific promoter database. SCPD provides access to yeast genes, regulatory elements and transcriptional factors, but also to analysis tools. It retrieves promoter sequences, search for consensus sequences ou make multisequence alignments, for example. This database can be completed by the users with some gene, consensus or matrix records.
Provides access to information about regulation of transcription initiation of Escherichia coli K-12. RegulonDB is a resource that contains decades of knowledge from classic molecular biology experiments, and from high-throughput genomic methodologies. It provides datasets for interactions for which there is no evidence that they affect expression, as well as expression datasets. A set of tools is also available.
A database for capturing, visualization and analysis of transcription factor regulons that were reconstructed by the comparative genomic approach in a wide variety of prokaryotic genomes.
A curated collection of known Drosophila transcriptional cis-regulatory modules (CRMs) and transcription factor binding sites (TFBSs).
Intended to collect confirmed translation initiation sites (TISs) for prokaryotic genomes.
PRODORIC / PROcariotIC Database Of Gene-Regulation
Provides information about gene regulation in prokaryotes. PRODORIC is a database that gathers DNA binding sites for prokaryotic transcription factors. This repository includes entries generated by manually screening the literature, as well as transcription factor binding site (TFBS) detected by diverse high-throughput techniques. The database provides a basis for the prediction of gene regulatory networks (GRNs). The web application Virtual Footprint, for recognizing DNA patterns in prokaryotic genomes, is also available, but only the most essential options are offered.
Describes more than 100,000 computational predicted transcriptional regulatory modules within the human genome.
A plant promoter database that provides information on transcription start sites (TSSs), core promoter structure and regulatory element groups (REGs) as putative and comprehensive transcriptional regulatory elements.
PlantProm / Plant Promoter database
Offers promoter data collecting procedure and specific features of plant promoter sequences. PlantProm DB serves as a learning set in developing plant promoter prediction programs. It provides information on plant promoters with experimentally known transcription start site (TSS): (i) DNA sequence of the promoter region, (ii) Nucleotide Frequency Matrices (NFM) for canonical promoter elements, (iii) taxonomic and promoter type classification of promoters.
MPromDb / Mammalian Promoter Database
Integrates gene promoters with experimentally supported annotation of transcription start sites, cis-regulatory elements, CpG islands and chromatin immunoprecipitation microarray (ChIP-chip) experimental results with intuitively designed presentation. Users can search the database based on gene id/symbol, or by specific tissue/cell type and filter results based on any combination of tissue/cell specificity, Known/Novel, CpG/NonCpG, and protein-coding/non-coding gene promoters.
MAPPER database / Multi-genome Analysis of Positions and Patterns of Elements of Regulation
Contains putative Transcription Factor Binding Sites (TFBSs) located in the upstream sequences of genes from the human, mouse and D.melanogaster genomes.
Gives access to Drosophila melanogaster 5’-end mRNA tags at different developmental states. MachiBase is designed to assist fly biologists in their analyses of gene expression and in placing expression data in the context of functional genomics through genomic orientation. Users can access information on differentially expressed genes by either inputting the gene name as a keyword or selecting a chromosomal location. The database can assist biologists in explaining transcriptional initiation mechanisms by combining additional information on chromatin structure and DNA methylation.
A database of DNA binding specificities for Drosophila transcription factors (TFs) primarily determined using the bacterial one-hybrid system. FlyFactorSurvey provides community access to over 400 recognition motifs and position weight matrices for over 200 TFs, including many unpublished motifs. Search tools and flat file downloads are provided to retrieve binding site information (as sequences, matrices and sequence logos) for individual TFs, groups of TFs or for all TFs with characterized binding specificities. Linked analysis tools allow users to identify motifs within our database that share similarity to a query matrix or to view the distribution of occurrences of an individual motif throughout the Drosophila genome. Together, this database and its associated tools provide computational and experimental biologists with resources to predict interactions between Drosophila TFs and target cis-regulatory sequences.
EPD / Eukaryotic Promoter Database
An annotated non-redundant collection of eukaryotic POL II promoters, for which the transcription start site has been determined experimentally.
A database designed to provide access to reliable annotations of the alternative splicing pattern of human genes, obtained by ASPic algorithm, and to the functional annotation of predicted isoforms.
A database of new exon boundaries induced by pathogenic mutations in human disease genes. This resource will be useful for fine-tuning splice-site prediction algorithms, better definition of auxiliary splicing signals and design of new reporter assays.
Gathers functional annotation for alternatively spliced (AS) genes. ECgene contains the domain, Gene Ontology (GO) and expression pattern analysis based on the EST and SAGE data. It also provides tools to study differential expression pattern which may assist in recognition of tissue- and/or cancer-specific genes. This platform permits users to infer functional significance of each splice variant.
EDAS / EST Derived Alternative Splicing database
A database of alternatively spliced human genes, contains data on the alignment of proteins, mRNAs, and EST. EDAS contains information on all exons and introns observed, as well as elementary alternatives formed from them. The database makes it possible to filter the output data by changing the cut-off threshold by the significance level.
EID / Exon-Intron Database
Offers a comprehensive and convenient dataset of sequences for computational biologists who study exon-intron gene structures and pre-mRNA splicing. The collection of exons and introns has been extended beyond coding regions and current versions of EID contain data on untranslated regions of gene sequences as well. Intron-less genes are included as a special part of EID. For species with entirely sequenced genomes, species-specific databases have been generated. A novel Mammalian Orthologous Intron Database (MOID) has been introduced which includes the full set of introns that come from orthologous genes that have the same positions relative to the reading frames.
H-DBAS / Human-transcriptome DataBase for Alternative Splicing
Supplies data about human alternative splicing (AS) variants from the viewpoints of protein functions affected by AS. H-DBAS is based on cDNA information from the H-Invitational cDNA Annotation Project that were manually inspected and annotated. It offers the possibility for users to discover the world of human AS. This database stores AS events according to whether they are transcribed from conserved genomic regions or whether the corresponding transcripts that are also identified in mice.
A free database that provides a list of human internal exons and reports all their known splice events based on EST information from the UCSC Genome Browser . This list can be restricted by the user to either only a specific region in the genome (by specifying the chromosome, the strand and the start and end position), to a whole chromosome or to a group of genes. Furthermore, exons can be filtered according to their splicing type (constitutive exons, cassette exons and exons with one or more alternative 3′ and/or 5′ splice sites).
This database was built upon genomic annotation of splicing patterns of known genes derived from spliced alignment of complementary DNAs (cDNAs) and expressed sequence tags. Hollywood was implemented as a relational database and currently contains comprehensive information for human and mouse. It is accompanied by a web query tool that allows searches for sets of exons with specific splicing characteristics or splicing regulatory element composition, or gives a graphical or sequence-level summary of splicing patterns for a specific gene.
Yeast Intron Database
Gathers information about the spliceosomal introns of the yeast Saccharomyces cerevisiae. Yeast Intron Database make an inventory of known spliceosomal introns in the yeast genome and documented used splice sites. Besides, the database aims to identify and analyze splice site context in terms of the nature and activities of the trans-acting factors that mediate splice site recognition.
Spliceosome Database
A database of spliceosome-associated proteins and snRNAs. SpliceosomeDB provides tools to search for spliceosome genes/proteins based on several characteristics including name(s), complex designation, identification in particular mass spectrometry experiments, source organism and conserved motif/domain signatures. Each gene/protein is linked to additional sources of information and to orthologous genes in several model systems. Tools are also available for comparing the composition of different intermediate splicing complexes and for directly examining the lists of proteins identified in mass spectrometry experiments analyzing purified spliceosome complexes.