Haplotype phase inference software tools | Population genetics data analysis
Two categories of computational methods exist for determining haplotypes: haplotype phasing and haplotype assembly. Given the genotypes of a sample of individuals from a population, haplotype phasing attempts to infer the haplotypes of the sample using haplotype sharing information within the sample. In the related problem of genotype imputation, a phased reference panel is used to infer missing markers and haplotype phase of the sample. Methods for haplotype phasing and imputation are based on computational and statistical inference techniques, but both use the fact that closely spaced markers tend to be in linkage disequilibrium and smaller haplotypes blocks are often shared in a population of seemingly unrelated individuals.
Performs genotype calling, genotype phasing, imputation of ungenotyped markers, and identity-by-descent segment detection. Beagle can be applied to thousands of samples across genome-wide single nucleotide polymorphism (SNP) data. It can retrieve short tracts of identity by descent (IBD). This tool utilizes composite reference haplotypes to model large genomic regions with a parsimonious statistical model.
Performs genetic association analysis. UNPHASED is an application that permits users to analyze nuclear families and unrelated subjects, discrete or quantitative traits. It also provides global association tests, tests of individual haplotypes and permutation tests that allows for multiple testing. This method supports non-genetic covariates including parent-of-origin.
Assists users to assemble noisy single-molecule sequences. Canu introduces several features including computational resource discovery, adaptive k-mer weighting, automated error rate estimation, sparse graph construction, and graphical fragment assembly (GFA) outputs. This pipeline consists of three different stages: correction, trimming, and assembly. Moreover, this tool can auto-detect available resources and configure itself to maximize resource utilization.
A computational method for quantifying genetic diversity in a mixed sample and for identifying the individual clones in the population, while accounting for sequencing errors. This approach provides the user also with an estimate of the quality of the reconstruction. Further, ShoRAH can reconstruct the global haplotypes and estimate their frequencies. ShoRAH was run on simulated data and on real data obtained in wet lab experiments to assess its reliability.
Consists of a method that allows imputation for different types of populations. AlphaImpute utilizes information from close and distant single nucleotide polymorphism (SNP) loci to impute genotypes for individuals for which genotype information, and for individuals which have close or distant relatives densely genotyped.This tool can also impute phased alleles or provide allele probabilities for animals in the pedigree that have low-density genotypes or no genotypes.
An algorithm for haplotype resolution and block partitioning. The algorithm uses a stochastic model for genotype generation, based on the biological finding that genotypes can be partitioned into blocks of low recombination rate, and in each block, a small number of common haplotypes is found. Our model uses the notion of a probabilistic common haplotype, which can have different forms in different genotypes, thereby accommodating errors, rare recombination events, and mutations. GERBIL was shown to be quick and accurate even when applied to many hundreds of individuals.
Provides a method for efficiently phasing large data sets. HAPI-UR is an application that was developed for the application to large genotype data sets of unrelated and/or trio and duo samples. Because the number of states that HAPI-UR uses in any window is dependent on the haplotype structure and diversity in an individual, the method adapts to the nature of the data set. It is also efficient in inferring phase in large data sets.