ExAC browser
A lightweight, flexible browser framework to display large population datasets of genetic variation. ExAC browser provides gene- and transcript-centric displays of variation, a critical view for clinical applications. Additionally, it provides a variant display, which includes population frequency and functional annotation data as well as short read support for the called variant. This browser is an open-source, freely available and has already been used extensively by clinical laboratories worldwide.
Vancouver Short Read Analysis
A package for collating and searching across thousands of next-generation sequence (NGS) samples. Vancouver Short Read Analysis provides a database can be installed easily to rapidly access and store genetic variation information, compare data from any sequencing platform and perform aggregate analyses. The schema of the database makes rapid and insightful queries simple and enables easy annotation of novel or known genetic variations. Filtering can be done by utilizing annotations, matched pair datasets or datasets marked as non-cancer for separating polymorphisms from putative variants.
A package to explore and visualize a collection of mutational patterns that are relevant for deciphering which mutational processes have been active in a sample. MutationalPatterns facilitates both (1) de novo mutational signature extraction and (2) quantification of the contribution of user-specified mutational signatures. While the first approach can be used to identify new mutational signatures, this is only meaningful for datasets with a large number of samples with diverse mutation spectra, as it relies on the dimensionality reduction method non-negative matrix factorization (NMF). The second approach can be used to study the activity of mutational processes in a single sample, and to further characterize previously-identified mutational signatures by assessing their contribution in different systems or under different conditions.
IGSR / International Genome Sample Resource
Expands in data type and population diversity the resources from the 1000 Genomes Project. IGSR represents the largest open collection of human variation data and provides easy access to these resources. IGSR was established in 2015 to maintain and extend the 1000 Genomes Project data, which has been widely used as a reference set of human variation and by researchers developing analysis methods. IGSR has mapped all of the 1000 Genomes sequence to the newest human reference (GRCh38), and will release updated variant calls to ensure maximal usefulness of the existing data. IGSR is collecting new structural variation data on the 1000 Genomes samples from long read sequencing and other technologies, and will collect relevant functional data into a single comprehensive resource. IGSR is extending coverage with new populations sequenced by collaborating groups.
International HapMap Project
A multi-country effort to identify and catalog genetic similarities and differences in human beings. Using the information in the HapMap, researchers will be able to find genes that affect health, disease, and individual responses to medications and environmental factors. The goal of the International HapMap Project is to compare the genetic sequences of different individuals to identify chromosomal regions where genetic variants are shared. By making this information freely available, the Project will help biomedical researchers find genes involved in disease and responses to therapeutic drugs.
Provides a resource for data analysis and visualization in a gene-by-gene or genome-wide scale. PlasmoDB is a functional genomic database for Plasmodium spp. It belongs to a family of genomic resources that are housed under the EuPathDB Bioinformatics Resource Center (BRC) umbrella. Data in PlasmoDB can be queried by selecting the data of interest from a query grid or drop down menus. Various results can then be combined with each other on the query history page.
dbSNP / database of Short Genetic Variations
Provides a public repository for genetic variation. dbSNP includes disease-causing clinical mutations as well as neutral polymorphisms. This method links variations (polymorphisms and clinical mutations) to NCBI sequence resources via BLAST and E-PCR analysis. It also facilitates searches along five major axes of information: (i) sequence location, (ii) function, (iii) cross-species homology, (iv) single nucleotide polymorphisms (SNPs) quality or validation status and (v) degree of heterozygosity (degree of population variation).
Provides a blueprint for large-scale precision medicine initiatives and genomics-guided therapeutic discovery. DiscovEHR is a browser which contains variant frequency data to enable allele frequency comparisons with other population-based and biobank resources. The ability to search by gene, rs#, and position is also available. The DiscovEHR collaboration study cohort is derived from individuals who consented to participate in Geisinger’s MyCode Community Health Initiative.
A WGS imputation panel comprising 3,781 samples from the UK10K Cohorts project. This project has generated a data set of 3,781 whole genomes sequenced at low depth aiming to exhaustively characterize genetic variation down to 0.1% minor allele frequency in the British population. The UK10K project will enable researchers in the UK and beyond to better understand the link between low-frequency and rare genetic changes, and human disease caused by harmful changes to the proteins the body makes.
A knowledge base and aggregator for human genomic variants. VarSome gives fast access to more than 33 billion items of variant and gene annotation sourced from public databases. It also provides advanced visualisations and features such as functinal annotation for any variant, known or unknown, as well as automated variant classification based on professional guidelines. Its tens of thousands of users add an fast-growing amount of their own insights on its pages. Users may also deploy the associated API to obtain data directly into their software.
UMD-BRCA1 / Universal mutation database-BRCA1
Gathers data relative to French BRCA1 variants. UMD-BRCA1 is a database that merges anonymized data from commercial entities and registered research. The repository is divided into six sections displaying information related to the gene, the protein, the clinics and mutations. In addition, the database gives access to statistics as well as the references corresponding to the targeted gene. The platform is coupled with UMD-BRCA2 under BRCA Share appellation.
UMD-BRCA2 / Universal mutation database-BRCA2
Compiles information dealing with French BRCA2 variants. UMD-BRCA2 presents anonymized data from commercial entities and registered research. The repository is divided into six sections displaying data related to the gene, the protein, the clinics and mutations. In addition, the database gives access to statistics as well as the references corresponding to the targeted gene. The platform is coupled with UMD-BRCA1 under BRCA Share appellation.
DGV / Database of Genomic Variants
Provides a comprehensive summary of structural variation in the human genome. We define structural variation as genomic alterations that involve segments of DNA that are larger than 50bp. The content of the database is only representing structural variation identified in healthy control samples. The Database of Genomic Variants provides a useful catalog of control data for studies aiming to correlate genomic variation with phenotypic data. The database is continuously updated with new data from peer reviewed research studies.
A data set consisting primarily of relatively rare human genomic CNVs that were derived from 2026 healthy individuals using high-density, SNP-based oligonucleotide microarrays. The generation of this resource is aimed at serving as a reference to aid in the investigation of the clinical significance of CNVs detected in disease cohorts. CHOP CNV is a valuable resource to investigators for applications in clinical diagnostics as well as in CNV enrichment and association studies for particular disease cohorts.
Platinum Genomes
A genome-wide catalogue of 5.4 million “platinum” variants. We generated deep, whole-genome sequence data of seventeen individuals in a three-generation pedigree and called variants in each genome using a range of currently available algorithms. We used haplotype transmission information to create a “platinum” variant catalogue of 4.7 million single nucleotide variants (SNVs) plus 0.7 million small (1-50bp) insertions and deletions (indels) that are consistent with the pattern of inheritance in the parents and eleven children of this pedigree. Platinum genotypes are highly concordant with the current catalogue of the National Institute of Standards and Technology for both SNVs (>99.99%) and indels (99.92%), and add a validated truth catalogue that has 26% more SNVs and 45% more indels.
A genomic database for Giardia lamblia. GiardiaDB is based on the genome of the WBC6 clinical isolate of G. lamblia. It is accessed via the standard EuPathDB web interface, providing a wide variety of tools for genomic database mining. In addition to BLAST and pattern/motif similarity searches, users can identify genes based on genomic position; common name or keyword; gene attributes (such as gene type, or number of exons); evidence of transcript expression including ESTs, SAGE tags, microarray and proteomics; gene product annotation (such as GO function, or EC enzyme number); and predicted cellular location (based on signal peptide and transmembrane predictions).
A database combining multiple sources of information to generate a complete catalogue of non-redundant human polymorphic inversions. Due to the complexity of this type of changes and the underlying high false-positive discovery rate, it is necessary to integrate all the available data to get a reliable estimate of the real number of inversions. InvFEST automatically merges predictions into different inversions, refines the breakpoint locations, and finds associations with genes and segmental duplications. In addition, it includes data on experimental validation, population frequency, functional effects and evolutionary history. All this information is readily accessible through a complete and user-friendly web report for each inversion.
Collects gene-centered background De Novo Germline Mutation Rates (DNMRs) obtained from different methods and population variation data. MirDNMR provides the following functions: (i) browse and search the background DNMRs of each gene predicted by four different methods, including GC content (DNMR-GC), sequence context (DNMR-SC), multiple factors (DNMR-MF) and local DNA methylation level (DNMR-DM); (ii) search variant frequencies in publicly available databases, including ExAC, ESP6500, UK10K, 1000G and dbSNP and (iii) investigate the DNM burden to prioritize candidate genes based on the four background DNMRs using three statistical methods.
Provides potential inter-species copy number variations (CNV) information. CNVVdb is a database that allows users to find duplicated regions within a genome (paralogues) and between different genomes (orthologues). It supplies multiple sequence alignments of all paralogues/orthologues identified. The output of this resource includes: (1) information summary about the query region in the target species, (2) information summary about the matched region(s) in the subject species, (3) detailed information for each matched region and (4) multiple sequence alignments of the identified paralogues/orthologues.
An online resource containing a range of genomic datasets for wheat (Triticum aestivum) that will assist plant breeders and scientists to select the most appropriate markers for marker assisted selection. CerealsDB includes a database which currently contains in excess of 100,000 putative varietal SNPs, of which several thousand have been experimentally validated. In addition, CerealsDB contains databases for DArT markers and EST sequences, and links to a draft genome sequence for the wheat variety Chinese Spring.
TMC-SNPdb / Tata Memorial Centre-SNP database
A database of ‘normal’ germline variants derived from Indian (nonEuropean Caucasian population). TMC-SNPdb is the first open source, flexible, upgradable, and freely available single nucleotide polymorphism (SNP) database, not yet included in the public databases with predominant Caucasian representations. Beyond cancer somatic analyses, we anticipate utility of the TMC-SNPdb in several Mendelian germline diseases. This database is flexible to accommodate the need for customization by allowing inclusion of similar datasets from additional individuals.
HGSVP / Human Genome Structural Variation Project
A catalogue of human genomic polymorphisms ascertained by experimental and computational analyses. This database includes large-scale structural variation (LSV), copy number polymorphisms (CNPs) and intermediate-sized structural variation (ISV) as determined by array comparative genomic hybridization (Iafrate et al., 2004, Sharp et al., 2005), representational oligonucleotide microarray analysis (Sebat et al., 2004) and fosmid paired-end sequence analysis (Tuzun et al, 2005). The data are mapped against the UCSC Human Genome Browser.
Assists in storing genomic variation data obtained from next-generation sequencing experiments. Varda is an application that holds data like full-genome or exome sequencing of individuals or populations. Variants can be imported from standard formats and annotated with their frequencies. It was designed to support (i) private exome variant database for a sequencing lab, (ii) shared database between several groups and (iii) publicly sharing variant frequencies from a population study.
