Association mapping software tools | Genome-wide association study data analysis
In the recent years, in order to dissect complex quantitative traits and identify candidate genes affecting such traits, the association mapping approach has been widely used. This strategy relies on detecting linkage disequilibrium (LD) between genetic markers and genes controlling the phenotype of interest by exploiting the recombination events accumulating over many generations and thus increasing the accuracy of the associations detected.
A free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner. The focus of PLINK is purely on analysis of genotype/phenotype data, so there is no support for steps prior to this (e.g. study design and planning, generating genotype or CNV calls from raw data). Through integration with gPLINK and Haploview, there is some support for the subsequent visualization, annotation and storage of results.
Reduces computational time for analyzing large genome-wide association studies (GWASs) data sets. EMMAX intends to prevent the overdispersion of test statistics using a statistical model that explicitly takes into account of sample structure, rather than correcting the overdispersed test statistics resulting from a lack of considering genetic relatedness in the statistical model.
Conducts mixed model analysis in a small number of O(MN)-time iterations. BOLT-LMM employs a Gaussian mixture model of single nucleotide polymorphism (SNP) effects. It tests the residuals for association with candidate markers via a retrospective score statistic. It enables users to schematize infinitesimal genetic architectures. This tool allows increased association power over standard mixed model analysis while controlling false positives.
Allows simulation of cell structure and function. DCell is an interpretable or “visible” neural network (VNN) simulating a basic eukaryotic cell. The functional state of each subsystem is represented by a bank of neurons and connectivity of these neurons is set to mirror the biological hierarchy. The software hierarchical structure captures many different clusters of features at multiple scales, pushing interpretation from the model input to internal features representing biological subsystems.
Controls for the disruptive effects of both population structure and recombination. treeWAS is a phylogenetic approach that is able to overcome many of the limitations of existing microbial genome wide association studies (GWAS) approaches. This method uses the simulation of a null genetic dataset to establish whether high association score values in the empirical dataset. It provides both specificity and power in a wide range of settings, and consistently offers the best overall performance.
Implements advanced statistical methods including the compressed mixed linear model (CMLM) and CMLM-based genomic prediction and selection. The GAPIT package can handle large datasets in excess of 10 000 individuals and 1 million single-nucleotide polymorphisms with minimal computational time. It also provides users access to tables and graphs for interpreting results.
Provides a method for genetic association mapping of binary traits in samples with related individuals. CERAMIC gathers relevant covariates, pedigree and can incorporate data on individuals with partially missing data. Moreover, the software is able to correct binary phenotypes for both covariates and additive polygenic effects and can be used for performing calculations for current association studies.
Scores the components of a pan-genome for associations to traits while accounting for population stratification. Scoary allows users to study the association between pangenome genes presence or absence and observed phenotypes. The software sequentially scores each candidate gene in the accessory genome, according to its apparent correlation to predefined traits, and genes that pass the initial screening are re-analyzed while incorporating information on the phylogenetic structure of the sample.
Combines statistical analysis modules into pipelines to deal with heterogenous big data. T-BioInfo is an application that can be used for: (1) next-generation sequencing (NGS) data (transcriptomics, genomics/epigenetics, and DNA/RNA); (2) mass-spectroscopy; (3) structural biology; and (4) data integration and modeling (virology, data association, and data mining).
Aggregates association strength of individual markers into pre-specified biological pathways. VEGAS2 is a a versatile pathway-based approach for genome-wide association studies (GWAS) data that accounts for gene size and linkage disequilibrium between markers using simulations from the multivariate normal distribution. First, it calculates the gene-based test statistics for all genes using the VEGAS (VErsatile Gene-based Association Study) approach which accounts for the linkage disequilibrium (LD) between the single nucleotide polymorphisms (SNPs) within a gene through simulation. Second, for each of a set of pre-specified gene-sets, the relevant gene-based results are carried forward to compute a pathway-based test.
Implements general linear model and mixed linear model approaches for controlling population and family structure. For result interpretation, the program allows for linkage disequilibrium statistics to be calculated and visualized graphically. Database browsing and data importation is facilitated by integrated middleware. Other features include analyzing insertions/deletions, calculating diversity statistics, integration of phenotypic and genotypic data, imputing missing data and calculating principal components.
Allows subset-based analysis of heterogeneous traits and subtypes. ASSET provides statistical tools designed for pooling association signals across multiple studies when true effects may exist only in a subset of the studies and possibly in opposite directions across studies. It is originally developed for conducting genetic association scans but can also be applied for analysis of non-genetic risk factors.
Identifies association between a candidate marker and a quantitative trait of interest, through use of unrelated individuals. QSAT is a quantitative similarity-based association test that controls population stratification through a set of genomic markers. The QSAT has a correct type I error rate in the presence of population structure and that it is more powerful than family-based association designs.
An R library for genome-wide association (GWA) analysis. GenABEL implements effective storage and handling of GWA data, fast procedures for genetic data quality control, testing of association of single nucleotide polymorphisms with binary or quantitative traits, visualization of results and also provides easy interfaces to standard statistical and graphical procedures implemented in base R and special R libraries for genetic analysis.
Enables researchers working with Arabidopsis thaliana to do genome wide association mapping (GWAS) on their phenotypes. GWAPP features an extensive, interactive, and user-friendly interface that includes interactive Manhattan plots and linkage disequilibrium plots. It also facilitates exploratory data analysis by implementing features such as the inclusion of candidate polymorphisms in the model as cofactors.
Starts from genome-wide analysis studies (GWAS) to deduce relationships using high-density genotype data. KING allows users to estimate pedigree information and includes additional functions to detect population structure in the presence of genetic relatedness or perform allele frequency statistics. This program can be applied to samples with thousands of individuals genotyped at millions of single nucleotides polymorphisms (SNPs) from autosomes.
Provides semantic similarity computations among Disease Ontology (DO) terms and genes which allows biologists to explore the similarities of diseases and of gene functions in disease perspective. Enrichment analyses including hypergeometric model and gene set enrichment analysis are also implemented to support discovering disease associations of high-throughput biological data. Comparison among gene clusters is also supported. DOSE provides several DO-specific visualization functions to produce highly customizable, publication-quality figures of similarity and enrichment analyses that are not available elsewhere. With these visualization tools, the results obtained by DOSE are more interpretable.
Supports quality control and analysis of genome-wide association studies (GWAS). GWASTools provides functions for interactive investigation and includes intensity data. It can be used to verify pedigrees for accuracy, as well as to deduce pairwise relationships from. This tool can plot kinship coefficients and includes several options, including genotype cluster plots, B allele frequency (BAF)/ log R ratio (LRR) plots with chromosome ideograms, quantile-quantile plots and Manhattan plots.
Allows user to obtain statistical and graphical summaries for comparisons across strata. EasyStrata is an application that facilitates evaluation or graphical presentation of stratified genome-wide association meta-analyses (GWAMAs) results for each single-nucleotide polymorphism (SNP) genome-wide. It permits to investigate potential gene-strata (GxS) effects. This method also streamlines data-handling of large-scale genome-wide association (GWA) data-sets.
Provides a haplotyped-based method for association mapping. GLASCOW is based on generalized linear mixed models (GLMMs) accounting for stratification and other covariates affecting the modeled trait. It proves also flexible since there is no need to define arbitrary windows and since haplotype origin can change at any position along the chromosome.