Provides a set of multiple sequence alignments and hidden Markov models (HMMs) for protein families. Pfam is constructed by capturing the diversity of a set of evolutionarily related sequences. It aligns a representative subset of the entire set of matching sequences to build the seed alignment. This database provides more than 17000 entries which are related by similarity of sequence, structure or profile-HMM.
A comprehensive set of protein domain families automatically generated from the UniProt Knowledge Database.
Provides a motif descriptor database. PROSITE offers an annotated collection of biologically meaningful motif descriptors dedicated to the identification of protein families and domains. This database uses two kinds of motif descriptors: (i) patterns or regular expressions in which the most significant residue information is discarded, and (ii) generalized profiles and quantitative motif descriptors that consider the overall similarity on the entire length of domains or proteins.
A resource consisting of curated multiple sequence alignments, Hidden Markov Models (HMMs) for protein sequence classification, and associated information designed to support automated annotation of (mostly prokaryotic) proteins.
A freely editable semantic wiki for community-based curation of the terms used in Neuroscience.
This page attempts to provide an up-to-date compendium of HTS mappers initially provided in the article "Tools for mapping high-throughput sequencing data".
The NGS WikiBook
A collaborative next-generation sequencing (NGS) resource. Users can search, browse, edit and create new content, so as to facilitate self-learning and feedback to the community. The overall structure and style for this dynamic material is designed for the bench biologists and non-bioinformaticians. The flexibility of online material allows the readers to ignore details in a first read, yet have immediate access to the information they need. Each chapter comes with practical exercises so readers may familiarize themselves with each step. The NGS WikiBook aims to create a collective laboratory book and protocol that explains the key concepts and describes best practices in this fast-evolving field.
CMS / Cancer Methylome System
A web-based database application designed for the visualization, comparison and statistical analysis of human cancer-specific DNA methylation. CMS provides visualization and analytic functions for cancer methylome datasets. A comprehensive collection of datasets, a variety of embedded analytic functions and extensive applications with biological and translational significance make this system powerful and unique in cancer methylation research.
DBCAT / DataBase of CpG islands and Analytical Tool
A database developed in order to recognize comprehensive mehtylation profiles of DNA alteration in human cancer. DBCAT is an online methylation analytical tool composed of three parts: a CpG Island Finder, a genome query browser and an analytical tool for methylation microarray data. The analytical tool can analyze raw data generated from scanners and search genes with methylated regions which could affect gene expression regulation. DBCAT not only identifies the regions of methylation but also searches the database to pick up genes with methylated regions of functional meanings.
Presents the most complete collection and annotation of aberrant DNA methylation in human diseases, especially various cancers. DiseaseMeth is focused not only on curated information about diseases, genes and corresponding methylation data, but also on predicted associations between diseases of interest and methylation of specific DNA regions based on the vast amounts of data that it contains. DiseaseMeth contains methylation data of 32701 samples from 88 diseases together with 679602 associations between diseases and methylation of genes. DiseaseMeth not only enlarges the data of increased DNA methylation, but also provides new tools to explore the relationships between methylation of genes and diseases.
A database for histone mutations and their phenotypes. The database collects phenotypic screening data from assays of systematically constructed histone mutants: Single-residue substitutions, multiple substitutions, correlation with known post-translational modifications, cross-species mapping.
The purpose of this database is to provide the scientific community with a resource to store DNA methylation data and to make these data readily available to the public.
Allows the study of the interplay between DNA methylation, gene expression and cancer. MethyCancer contains: (i) CpG Island (CGI) clones and global CGI predictions, (ii) DNA methylation data, (iii) cancer information, genes and mutations and (iv) correlation among DNA methylation, gene expression and cancer. It provides users with a search engine to query different data types and data interactions, and offers keyword search, advanced searches, namely Methylation Search, Gene Search, Cancer Search, Clone Search and Repeat Search.
Includes genome-wide DNA methylation profiles for human and mouse brains. MethylomeDB offers an important resource for research into brain function and behavior. It provides the first source of comprehensive brain methylome data, encompassing whole-genome DNA methylation profiles of human and mouse brain specimens that facilitate cross-species comparative epigenomic investigations, as well as investigations of schizophrenia and depression methylomes.
Furnishes a collection of single-base whole-genome methylome maps for the best-assembled eukaryotic genomes. NGSmethDB is a database simplifying the analysis of methylation data from different sources. Heterogeneous methylation data can be either simultaneously visualized through a web interface or selectively downloaded by means of the provided data mining tools. It allows researchers to design new experiments and retrieve the adequate data for them.
PEpiD / Prostate Epigenetic Database
Stores the curated epigenetic data retrieved by literature mining, which previous studies indicated as involved in prostate cancer (PC) of human, mouse, and rat. A user-friendly interface is implemented for easy and flexible query. PEpiD can serve as an important resource for epigenetic research in PC.
Explore, view, and download genome-wide maps of DNA and histone modifications from our diverse collection of epigenomic data sets. The Epigenomics resource also provides the user with a unique interface that allows for intuitive browsing and searching of data sets based on biological attributes.
Cistrome DB / Cistrome Data Browser
Provides an annotated knowledgebase of published or public ChIP-seq and DNase-seq data in mouse and human. Cistrome DB contains more than 2 500 ChIP-Seq datasets for transcription and chromatin regulators, over 2 000 histone modifications and variants, 400 DNase-Seq and about 1000 control datasets. It relies on the automatic parsing of sample metadata from data source.
Web resource, which is aimed to facilitate better hypothesis generation through knowledge syntheses mediated by better data integration and a user-friendly web interface. pfSNP integrates different algorithms/resources to interrogate thousand of SNPs from the dbSNP database for SNPs of potential functional significance based on previous published reports, inferred potential functionality from genetic approaches as well as predicted potential functionality from sequence motifs.
AnimalTFDB / Animal Transcription Factor DataBase
Gathers animal transcription factor (TF) lists, annotations, and provides prediction tools. AnimalTFDB is an animal TF database, which contains classification and annotation of genome-wide TFs and transcription cofactors in more than 90 animal genomes. The database provides annotations including gene phenotype and expression data in several species. TFs are classified into families, with one of them named “Others” including some orphan TFs. The prediction pipeline can be useful for TF identification in newly sequenced genome.
Holds conserved sequence motifs identified by genome scale motif discovery, similarity, clustering, co-occurrence and coexpression calculations.
Provides a set of hierarchical multi-layered concept of transcriptional regulation. CoryneRegNet consists of an ontology-based data warehouse. It employs a modular data processing pipeline that can recognize clusters of homologous proteins, match binding site motifs, determine operons and display special networks and graphs. This platform is useful for large-scale analysis of transcriptional regulation of gene expression in corynebacterial microorganisms.
DBTBS / database of transcriptional regulation in B. subtilis
Provides information about the Bacillus subtilis transcription system. DBTBS is composed of more than 100 binding factors and over 600 promoters of about 500 regulated genes. It can be used to demonstrate the presence or absence of potentially orthologous transcription factors and their corresponding cis-elements. This platform permits users to find the transcription factors that correspond to an inputted position-specific weighted matrix.
DBTSS / DataBase of Transcriptional Start Sites
Provides exact positions of transcriptional start sites (TSSs) in the genome. DBTSS was developed to facilitate the analyses regarding how germline variations or somatic mutations in cancers residing in transcriptional regulatory regions may affect the transcriptional regulation of their target genes in the diseased genome contexts. This resource also includes external epigenomic data.
YeTFaSCo / Yeast Transcription Factor Specificity Compendium
A collection of all available TF specificities for the yeast Saccharomyces cerevisiae in Position Frequency Matrix (PFM) or Position Weight Matrix (PWM) formats.
WebGeSTer DB / Web Genome scanner for terminators database
Informs user about sequenced bacterial genomes and plasmids. WebGeSTer DB consists of all types of intrinsic terminators identified in about 1000 bacterial chromosomes and more than 700 plasmids available at the NCBI database. This database provides user several whole-genome terminator maps.
Provides data of computationally predicted regulatory interactions within the genomes of several organisms of this group. Tractor_DB contains orthology relationships between gene pairs that are constructed with the bidirectional best hits (BBH) methodology. It permits the user to directly retrieve the information regarding the conservation of regulatory interactions within a given regulon from a map that contains all known Escherichia coli transcription factors (TFs) and the regulatory interactions that interconnect them.
Aims at classifying eukaryotic transcription factors (TFs) according to their DNA-binding domains (DBDs). For this, a classification schema comprising four generic levels (superclass, class, family and subfamily) was defined that could accommodate all known DNA-binding human TFs. TFClass is freely available through a web interface and for download in OBO format.
A database that facilitates the exploration of proteins involved in the regulation of transcription in humans.
STIFDB / Stress Responsive Transcription Factor Database
Provides a database of abiotic stress responsive genes. STIFDB is a resource that analyses promoters of abiotic stress responsive genes for potential stress-specific transcription factor binding sites. This resource can provide insights into the regulation of these stress responsive genes by upstream transcription factors. It also offers clues towards stress signal that affects the transcription of this gene, which might offer clarity about signal specific regulation.
An yeast-specific promoter database. SCPD provides access to yeast genes, regulatory elements and transcriptional factors, but also to analysis tools. It retrieves promoter sequences, search for consensus sequences ou make multisequence alignments, for example. This database can be completed by the users with some gene, consensus or matrix records.
Provides access to information about regulation of transcription initiation of Escherichia coli K-12. RegulonDB is a resource that contains decades of knowledge from classic molecular biology experiments, and from high-throughput genomic methodologies. It provides datasets for interactions for which there is no evidence that they affect expression, as well as expression datasets. A set of tools is also available.
A database for capturing, visualization and analysis of transcription factor regulons that were reconstructed by the comparative genomic approach in a wide variety of prokaryotic genomes.
A curated collection of known Drosophila transcriptional cis-regulatory modules (CRMs) and transcription factor binding sites (TFBSs).
Intended to collect confirmed translation initiation sites (TISs) for prokaryotic genomes.
PRODORIC / PROcariotIC Database Of Gene-Regulation
Provides information about gene regulation in prokaryotes. PRODORIC is a database that gathers DNA binding sites for prokaryotic transcription factors. This repository includes entries generated by manually screening the literature, as well as transcription factor binding site (TFBS) detected by diverse high-throughput techniques. The database provides a basis for the prediction of gene regulatory networks (GRNs). The web application Virtual Footprint, for recognizing DNA patterns in prokaryotic genomes, is also available, but only the most essential options are offered.
Describes more than 100,000 computational predicted transcriptional regulatory modules within the human genome.
A plant promoter database that provides information on transcription start sites (TSSs), core promoter structure and regulatory element groups (REGs) as putative and comprehensive transcriptional regulatory elements.
PlantProm / Plant Promoter database
Offers promoter data collecting procedure and specific features of plant promoter sequences. PlantProm DB serves as a learning set in developing plant promoter prediction programs. It provides information on plant promoters with experimentally known transcription start site (TSS): (i) DNA sequence of the promoter region, (ii) Nucleotide Frequency Matrices (NFM) for canonical promoter elements, (iii) taxonomic and promoter type classification of promoters.
MPromDb / Mammalian Promoter Database
Integrates gene promoters with experimentally supported annotation of transcription start sites, cis-regulatory elements, CpG islands and chromatin immunoprecipitation microarray (ChIP-chip) experimental results with intuitively designed presentation. Users can search the database based on gene id/symbol, or by specific tissue/cell type and filter results based on any combination of tissue/cell specificity, Known/Novel, CpG/NonCpG, and protein-coding/non-coding gene promoters.
MAPPER database / Multi-genome Analysis of Positions and Patterns of Elements of Regulation
Contains putative Transcription Factor Binding Sites (TFBSs) located in the upstream sequences of genes from the human, mouse and D.melanogaster genomes.
Gives access to Drosophila melanogaster 5’-end mRNA tags at different developmental states. MachiBase is designed to assist fly biologists in their analyses of gene expression and in placing expression data in the context of functional genomics through genomic orientation. Users can access information on differentially expressed genes by either inputting the gene name as a keyword or selecting a chromosomal location. The database can assist biologists in explaining transcriptional initiation mechanisms by combining additional information on chromatin structure and DNA methylation.
A database of DNA binding specificities for Drosophila transcription factors (TFs) primarily determined using the bacterial one-hybrid system. FlyFactorSurvey provides community access to over 400 recognition motifs and position weight matrices for over 200 TFs, including many unpublished motifs. Search tools and flat file downloads are provided to retrieve binding site information (as sequences, matrices and sequence logos) for individual TFs, groups of TFs or for all TFs with characterized binding specificities. Linked analysis tools allow users to identify motifs within our database that share similarity to a query matrix or to view the distribution of occurrences of an individual motif throughout the Drosophila genome. Together, this database and its associated tools provide computational and experimental biologists with resources to predict interactions between Drosophila TFs and target cis-regulatory sequences.
EPD / Eukaryotic Promoter Database
An annotated non-redundant collection of eukaryotic POL II promoters, for which the transcription start site has been determined experimentally.
A database designed to provide access to reliable annotations of the alternative splicing pattern of human genes, obtained by ASPic algorithm, and to the functional annotation of predicted isoforms.
A database of new exon boundaries induced by pathogenic mutations in human disease genes. This resource will be useful for fine-tuning splice-site prediction algorithms, better definition of auxiliary splicing signals and design of new reporter assays.
Gathers functional annotation for alternatively spliced (AS) genes. ECgene contains the domain, Gene Ontology (GO) and expression pattern analysis based on the EST and SAGE data. It also provides tools to study differential expression pattern which may assist in recognition of tissue- and/or cancer-specific genes. This platform permits users to infer functional significance of each splice variant.
EDAS / EST Derived Alternative Splicing database
A database of alternatively spliced human genes, contains data on the alignment of proteins, mRNAs, and EST. EDAS contains information on all exons and introns observed, as well as elementary alternatives formed from them. The database makes it possible to filter the output data by changing the cut-off threshold by the significance level.
EID / Exon-Intron Database
Offers a comprehensive and convenient dataset of sequences for computational biologists who study exon-intron gene structures and pre-mRNA splicing. The collection of exons and introns has been extended beyond coding regions and current versions of EID contain data on untranslated regions of gene sequences as well. Intron-less genes are included as a special part of EID. For species with entirely sequenced genomes, species-specific databases have been generated. A novel Mammalian Orthologous Intron Database (MOID) has been introduced which includes the full set of introns that come from orthologous genes that have the same positions relative to the reading frames.
H-DBAS / Human-transcriptome DataBase for Alternative Splicing
Supplies data about human alternative splicing (AS) variants from the viewpoints of protein functions affected by AS. H-DBAS is based on cDNA information from the H-Invitational cDNA Annotation Project that were manually inspected and annotated. It offers the possibility for users to discover the world of human AS. This database stores AS events according to whether they are transcribed from conserved genomic regions or whether the corresponding transcripts that are also identified in mice.
A free database that provides a list of human internal exons and reports all their known splice events based on EST information from the UCSC Genome Browser . This list can be restricted by the user to either only a specific region in the genome (by specifying the chromosome, the strand and the start and end position), to a whole chromosome or to a group of genes. Furthermore, exons can be filtered according to their splicing type (constitutive exons, cassette exons and exons with one or more alternative 3′ and/or 5′ splice sites).
This database was built upon genomic annotation of splicing patterns of known genes derived from spliced alignment of complementary DNAs (cDNAs) and expressed sequence tags. Hollywood was implemented as a relational database and currently contains comprehensive information for human and mouse. It is accompanied by a web query tool that allows searches for sets of exons with specific splicing characteristics or splicing regulatory element composition, or gives a graphical or sequence-level summary of splicing patterns for a specific gene.
Yeast Intron Database
Gathers information about the spliceosomal introns of the yeast Saccharomyces cerevisiae. Yeast Intron Database make an inventory of known spliceosomal introns in the yeast genome and documented used splice sites. Besides, the database aims to identify and analyze splice site context in terms of the nature and activities of the trans-acting factors that mediate splice site recognition.
A database of spliceosome-associated proteins and snRNAs. SpliceosomeDB provides tools to search for spliceosome genes/proteins based on several characteristics including name(s), complex designation, identification in particular mass spectrometry experiments, source organism and conserved motif/domain signatures. Each gene/protein is linked to additional sources of information and to orthologous genes in several model systems. Tools are also available for comparing the composition of different intermediate splicing complexes and for directly examining the lists of proteins identified in mass spectrometry experiments analyzing purified spliceosome complexes.