Population structure detection software tools | Population genetics data analysis
Tools for estimating population structure from genetic data are now used in a wide variety of applications in population genetics. However, inferring population structure in large modern data sets imposes severe computational challenges.
A free software package for using multi-locus genotype data to investigate population structure. Its uses include inferring the presence of distinct populations, assigning individuals to populations, studying hybrid zones, identifying migrants and admixed individuals, and estimating population allele frequencies in situations where many individuals are migrants or admixed. It can be applied to most of the commonly-used genetic markers, including SNPS, microsatellites, RFLPs and AFLPs. fastSTRUCTURE estimates approximate posterior distributions on ancestry proportions 2 orders of magnitude faster than STRUCTURE, with ancestry estimates and prediction accuracies that are comparable to those of ADMIXTURE.
Attempts to reverse the admixture process and infers the haplotypic makeup of admixing source groups as well as admixture date. GLOBETROTTER is a program allowing: simulations of data, inclusion or exclusion of particular chromosomes, and genetic map chosen to provide genetic distances. This tool can be used for studying over 1400 individuals from 95 worldwide human groups, comprising 17 genotyped groups, 53 from the Human Genome Diversity Panel (HGDP) and 25 from other sources.
Uses principal components analysis to explicitly model ancestry differences between cases and controls along continuous axes of variation; the resulting correction is specific to a candidate marker’s variation in frequency across ancestral populations, minimizing spurious associations while maximizing power to detect true associations. The EIGENSOFT package has a built-in plotting script and supports multiple file formats and quantitative phenotypes.
Offers a platform for population genetic analysis of the available individual genome sequences. G-PhoCS is an application which is able to determine ancestral population sizes, population divergence times as well as migration rates from individual genome sequences. This application exploits migrations bands to reproduce gene flows. Besides, it can be used for unphased diploid genotypes analysis.
Allows users to compute pedigree structure and population. GENESIS is an application that provides functionalities to infer, estimate, and count through two main modules: (i) PC-Air that uses genome-wide single nucleotide polymorphism (SNP) data to determine the structure of a population from a sample that potentially includes known or cryptic relatedness, and (ii) PC-Relate for providing estimations of genetic relatedness and improves relationship classification.
Identifies and allows to visualize fine-scale population structures from a genetic relationship matrix among individuals/populations. NetView is an analysis pipeline which combines three different software tools to generate a high-definition network visualization of population structures. It includes Super Paramagnetic Clustering (SPC) Network analysis Tool (NeAT) and CYTOSCAPE. This pipeline is computationally efficient and can be easily applied on large-scale genome-wide data sets to assign individuals to particular populations and to reproduce fine-scale population structures without prior knowledge of individual ancestry.
Infers parameters of population history in a maximum likelihood framework using Monte Carlo simulation. pIPHULA is a parallelized software that facilitates the large-scale multi-gene studies necessary to unravel population demographic parameters based on the amounts of DNA sequence data currently available. This program includes a script to generate figures from the results.
A software tool written in C++ for quick and accurate estimation of individual ancestry coefficients of a dataset exhibiting population structure. PSIKO takes as input file in the .geno format, with each row consisting of a SNP, and each column consisting of an individual. It then estimates the number of founder populations, outputs ancestry estimates as well as the principal components of the dataset for subsequent use in association studies.
Identifies potential transmission in the context of epidemiological diseases. MinDistB fixes the distance between viral populations as the minimum Hamming distance between their representatives. It is able to take into account the sizes of relative borders of each pair of viral populations. This tool was tested on experimental outbreak sequencing data. It employs minimal distances between intra-host viral populations to proceed.
Generates a population of 3D genome structures where each domain is represented as a sphere. PGS is a user-friendly software package that runs on local machines and high performance computing platforms. The software automatically generates an analysis of the structure population, including a description of the model quality based on its contact probability agreement with experiments and various structural genome features, including the radial nuclear positions of individual chromatin domains. The individual genome structures also contain a wealth of information and can be used to detect higher-order structural patterns of chromatin regions.
Represents a predictive model for microbiome composition data. BioMiCo facilitates interpretation of a community structure in light of user-defined feature labels. It is a hierarchical model that can be used to simultaneously learn how assemblages of operational taxonomic units (OTUs) contribute to microbiome structure, and how multiple assemblages might be related to the known features of the samples.
Assists users in manipulation of large multilocus molecular datasets. Functionality can be divided among diagnostic-, manipulation-, sampling-, simulation-, and transformation-based tools. Metadata from large genomic data sets can be efficiently extracted, without the need to view data in a text-editing program. genepopedit works cross-platform and can easily integrate into existing population genomics workflows either directly through R or in combination with other genomic analysis software. Importantly, genepopedit provides a simple yet robust code-based tool for repeatable genomic data manipulation, which has been proven to be stable for data sets in excess of 200 000 single nucleotide polymorphisms (SNPs).
Finds recombination hotspots from population genetic data. SequenceLDhot is based on an approximate marginal likelihood method. It scans through a chromosomal region of interest and considers fitting a recombination hotspot at a set of possible locations. This tool considers a grid of possible hotspot positions and assesses the evidence for the presence of the hotspot at each of these positions.
Allows to analyse and visualise population structure. pophelper supports output run files generated from population analysis programs such as STRUCTURE, TESS and numeric delimited formats such as ADMIXTURE or fastSTRUCTURE. The pophelper package can be used to tabulate runs, summarise runs, estimate K using the Evanno method, export files for CLUMPP, export files for DISTRUCT and generate barplot figures. The pophelper R package and web app are available to assist users working with molecular markers to investigate population structure.
Provides a substantial decrease in the time required to validate and conduct hybrid detection by enabling the parallelization of analyses using NEWHYBRIDS. parallelnewhybrid enables the exploration of hybrid class assignment power and the utilization of larger datasets than previously feasible with NEWHYBRIDS. This tools consists of an example data set, a readme and three operating system-specific functions to execute parallel newhybrids analyses on each of a computer's c cores.
Develops for detection, with estimates of efficiency and accuracy, of multi-generational hybrid individuals using genetic or genomic data in conjunction with the program NEWHYBRIDS. hybriddetective includes functions for the development and testing of diagnostic panels of markers, the simulation of multi-generational hybrids, and the quantification and visualization of the accuracy with which (simulated) hybrids can be detected. Overall, this package delivers a streamlined hybrid analysis platform, providing improvements in speed, ease of use and repeatability over current ad hoc approaches.
Calculates “K” estimators. StructureSelector is a web based software which aims to help in selecting and visualizing of the best estimators across a targeted file. The software includes MedMedK, MedMeaK, MaxMedK and MaxMeaK and two other estimators. Besides, it can generate graphical representations of the results for improving data submission and rapid import of graphical plots.
Allocates and simulates population by using amplified fragment length polymorphism markers. AFLPOP is an adaptation of Paetkau’s method for co-dominant alleles. It can provide information on the rates and types of incorrect allocations and on empirical distributions of likelihood statistics. The tool uses a filtering procedure that allows the selection of loci according to user-defined criteria.
A method for a principal component analysis (PCA) analogue on binomial data via estimation of latent structure in the natural parameter. LFA seeks to directly model the logit transformation of probabilities underlying observed genotypes in terms of latent variables that capture population structure. We demonstrate these advances on data from the Human Genome Diversity Panel and 1000 Genomes Project, where we are able to identify SNPs that are highly differentiated with respect to structure while making minimal modeling assumptions.
Integrates STRUCTURE analysis with post-processing using a pipeline approach in addition to implementing parallel computation. StrAuto is a Python program to streamline population structure analysis using parallel computing. It implements a pipeline that combines STRUCTURE analysis with the Evanno K analysis and visualization of results using STRUCTURE HARVESTER. This method runs over multiple processors using GNU Parallel. These functionalities make StrAuto ideal for deployment on high performance computing clusters and multi-core personal workstations, to reduce the computational time.