Genotype imputation software tools | Genome-wide association study data analysis
Genotype imputation has been widely adopted in the postgenome-wide association studies (GWAS) era. Owing to its ability to accurately predict the genotypes of untyped variants, imputation greatly boosts variant density, allowing fine-mapping studies of GWAS loci and large-scale meta-analysis across different genotyping arrays.
A free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner. The focus of PLINK is purely on analysis of genotype/phenotype data, so there is no support for steps prior to this (e.g. study design and planning, generating genotype or CNV calls from raw data). Through integration with gPLINK and Haploview, there is some support for the subsequent visualization, annotation and storage of results.
A program for the analysis of single SNP association in genome-wide studies. The tests implemented include 1) binary (case-control) phenotypes, single and multiple quantitative phenotypes, 2) Bayesian and Frequentist tests, 3) ability to condition upon an arbitrary set of covariates and/or SNPs and 4) various different methods for the dealing with imputed SNPs.
Assists users in studying phased genotypes. Minimac is an application that can handle large reference panels with hundreds or thousands of haplotypes. This application is based on MaCH, an algorithm for genotype imputation. It supports the imputation of genotypes on the X chromosome. It relies on a two-step approach: (i) the samples that will be analyzed must be phased into a series of estimated haplotypes and (ii) imputation is carried out directly into these phased haplotypes.
A computer program for phasing observed genotypes and imputing missing genotypes. IMPUTE increases accuracy and combines information across multiple reference panels while remaining computationally feasible. IMPUTE v2 attains higher accuracy than other methods when the HapMap provides the sole reference panel, but that the size of the panel constrains the improvements that can be made.
Provides genotype imputation and phasing service. Sanger Imputation Service is a web application which allows to upload genome-wide association study (GWAS) data and receive imputed and phased genomes back. Optional pre-phasing is with EAGLE2 or SHAPEIT2 and imputation is with Positional Burrows-Wheeler Transform (PBWT) into a choice of reference panels including 1000 Genomes Phase 3, UK10K, and the Haplotype Reference Consortium. The software is aimed at researchers wanting to impute many thousands of GWAS samples against a consistent reference in a consistent manner.
Provides a genotype imputation service using Minimac3. Michigan Imputation Server allows to upload phased or unphased genome-wide association study (GWAS) genotypes and receive phased and imputed genomes in return. The server offers imputation from HapMap, 1000 Genomes (Phase 1 and 3), Consortium on Asthma among African-ancestry Populations in the Americas (CAAPA) and the new Haplotype Reference Consortium (HRC) reference panel.
Assists users in studying phased genotypes. Minimac3 is an application that can handle large reference panels with hundreds or thousands of haplotypes. This application is based on MaCH, an algorithm for genotype imputation. It supports the imputation of genotypes on the X chromosome. It exploits similarities among haplotypes in small genomic segments to reduce the effective number of states over which the hidden Markov model (HMM) iterates.
A method for estimating haplotypes, using genotype data from unrelated samples or small nuclear families, that leads to improved accuracy and speed compared to several widely used methods. SHAPEIT scales linearly with the number of haplotypes used in each iteration and can be run efficiently on whole chromosomes.
It can resolve long haplotypes or infer missing genotypes in samples of unrelated individuals. Specifically, MACH can estimate haplotypes, impute missing genotypes in a variety of populations, using the HapMap sample or another set of densely genotyped individuals as a reference, analyze shotgun re-sequencing data from high-throughput technologies now being developed, and carry out simple tests of association.
Leverages genetic information for imputing rice datasets. Rice Imputation Server uses IMPUTE2 as the imputation engine. The software enables users to upload their data, such as genome-wide single nucleotide polymorphisms (SNPs) generated by genotyping-by-sequencing (GBS) methods, and receive imputed data sets back. It provides “SNP filters” to facilitate downstream trimming of imputed data sets, as well as other options for filtering, including basic Plink1.9 utilities. This program aims to increase imputation accessibility to rice researchers throughout the world.
Consists of a linkage-disequilibrium framework to genotype inference in parents-offspring trios. TrioCaller implements a method to call genotypes and infer haplotypes for whole genome shotgun sequencing data collected in trios, unrelated individuals, or parent-offspring pairs. The software can facilitate genotype calling and haplotype inference for sequencing projects.
Aggregates association strength of individual markers into pre-specified biological pathways. VEGAS2 is a a versatile pathway-based approach for genome-wide association studies (GWAS) data that accounts for gene size and linkage disequilibrium between markers using simulations from the multivariate normal distribution. First, it calculates the gene-based test statistics for all genes using the VEGAS (VErsatile Gene-based Association Study) approach which accounts for the linkage disequilibrium (LD) between the single nucleotide polymorphisms (SNPs) within a gene through simulation. Second, for each of a set of pre-specified gene-sets, the relevant gene-based results are carried forward to compute a pathway-based test.
A statistical model for patterns of genetic variation in samples of unrelated individuals from natural populations. fastPHASE is based on the idea that, over short regions, haplotypes in a population tend to cluster into groups of similar haplotypes. For imputing missing genotypes, methods based on this model are as accurate or more accurate than existing methods. For haplotype estimation, the point estimates are slightly less accurate than those from the best existing methods but require a small fraction of the computational cost.
A command-line program for the statistical analysis of SNP-disease association in case-control/cohort/cross-sectional studies with potentially missing genotype data. SNPMStat allows the user to estimate or test SNP effects and SNP-environment interactions by maximizing the (observed-data) likelihood that properly accounts for phase uncertainty, study design and gene-environment dependence.
Offers a platform for performing genome wide association studies (GWAS) based on haplotypes. ParaHaplo is an application leaning on data parallelism to allow users to perform analysis with an increased speed for the assessing of both haplotypes and P values. The application can be used in conjunction with other software for running: (i) genotype imputation and haplotype reconstruction; (ii) haplotype estimation and (iii) haplotype-based GWAS.
Performs both genotype calling and imputation. LinkImpute uses sequence read information. It permits to investigate the effects of missingness and read depth thresholds on the size and accuracy of the resulting genotype table. The tool offers a way for researchers to investigate a range of quality thresholds prior to imputation and determine what set of parameters best suit their research needs. It can be useful for generating large, high-quality genome-wide genotype data, especially from non-model organisms.
Assists users in studying phased genotypes. Minimac2 is designed for handling large reference panels with hundreds or thousands of haplotypes. This application is based on MaCH, an algorithm for genotype imputation. It can be used for a two-steps genotype imputation consisting of: (i) a step where user has to estimate haplotypes for entire sample, and (ii) another one where user can impute missing genotypes using the reference panel of its choice.
A publicly available SNP and indel imputability database, aiming to provide direct access to imputation accuracy information for markers identified by the 1000 Genomes Project across four major populations and covering multiple GWAS genotyping platforms. SNP and indel imputability information can be retrieved through a user-friendly interface by providing the ID(s) of the desired variant(s) or by specifying the desired genomic region. The query results can be refined by selecting relevant GWAS genotyping platform(s). This is the first database providing variant imputability information specific to each continental group and to each genotyping platform.
A method for performing diploid genotype imputation based on the hidden Markov model. FISH is suitable for large-scale dataset analyses. Both simulation studies and real-data analyses demonstrated that FISH was comparable to most of the existing popular methods in terms of imputation accuracy, but was much more efficient in terms of computation.
Enables quality control and imputation of genome-wide association studies (GWAS) data. Gimpute is a genotyping data processing and imputation pipeline that includes processing steps for genotype liftOver, quality control, population outlier detection, haplotype pre-phasing, imputation, post imputation, and data management. The software can be combined with existing pipelines by means of its modular structure. It is applicable for any study design.