A free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner. The focus of PLINK is purely on analysis of genotype/phenotype data, so there is no support for steps prior to this (e.g. study design and planning, generating genotype or CNV calls from raw data). Through integration with gPLINK and Haploview, there is some support for the subsequent visualization, annotation and storage of results.
Supplies a method to compute exact values of standard test statistics in linear mixed models. GEMMA is a program built on EMMA software. The application fits three types of models: univariate and multivariate linear mixed model as well as Bayesian sparse linear mixed model. In addition, it estimates variance component and chip heritability. This tool provides a mean to make exact calculations for large genome wide association studies (GWAS).
A linear mixed model (LMM) algorithm for testing sets of genetic markers in the presence of confounding structure such as arises from ethnic diversity and family relatedness within a cohort. FaST-LMM uses two random effects—one to capture the set association signal and one to capture confounders.
Provides a method for resequencing data. The algorithm proposed is based on a genome continuum model and functional principal components. Beside, this algorithm was developed to test the phenotypic association of rare variants with high power, nominal type I error rates and the ability to buffer the impact of sequencing errors and missing data.
Identifies candidate causal single nucleotide polymorphisms (SNPs) and their corresponding candidate causal pathways from genome-wide association study (GWAS). ICSNPathway is a web server that integrates linkage disequilibrium (LD) analysis, functional SNP annotation and pathway-based analysis (PBA). The software can contribute to improve GWAS data interpretation from variants to biological mechanisms to better guide future biological mechanism studies.
Performs knowledge-based secondary analyses from genome-wide association studies (GWAS). KGG is a statistical framework to classify, weight, prioritize and interpret association p-values. It simultaneously models both the diverse biological knowledge and statistical association p-values to produce optimal weights for the prioritization. It can also find additional single nucleotide polymorphisms (SNPs) of the HapMap dataset in strong linkage disequilibrium (LD).
A computational algorithm to search for gene-disease associations from GWASs, taking advantage of independent eQTL data. Sherlock is applicable to any complex phenotype. It is readily generalizable to molecular traits other than gene expression, such as metabolites, noncoding RNAs, and epigenetic modifications.
An R library for genome-wide association (GWA) analysis. GenABEL implements effective storage and handling of GWA data, fast procedures for genetic data quality control, testing of association of single nucleotide polymorphisms with binary or quantitative traits, visualization of results and also provides easy interfaces to standard statistical and graphical procedures implemented in base R and special R libraries for genetic analysis.
A simple, ready-to-use software which has been designed to analyze genetic-epidemiology studies of association using SNPs. Main capabilities include descriptive analysis, test for Hardy-Weinberg equilibrium and linkage disequilibrium. Analysis of association is based on linear or logistic regression according to the response variable (quantitative or binary disease status, respectively). Analysis of single SNPs: multiple inheritance models (co-dominant, dominant, recessive, over-dominant and log-additive), and analysis of interactions (gene-gene or gene-environment). Analysis of multiple SNPs: haplotype frequency estimation, analysis of association of haplotypes with the response, including analysis of interactions.
Offers a way to solve large-scale, numerically intensive genome wide association studies (GWAS) calculations on multi-core symmetric multiprocessing computer architectures. SNPRelate permits basic calculations of sample and single nucleotide polymorphism (SNP) eigenvectors. It allows principal component analysis (PCA) and identity-by-descent (IBD) relatedness analysis on genomic data structure (GDS) genotype files. The tool permits to accelerate computations on SNP data.
Contains classes and methods to help the analysis of whole genome association studies. SNPassoc utilizes S4 classes and extends haplo.stats R package to facilitate haplotype analyses. The package is useful to carry out most common analysis when performing whole genome association studies. These analyses include descriptive statistics and exploratory analysis of missing values, calculation of Hardy-Weinberg equilibrium, analysis of association based on generalized linear models (either for quantitative or binary traits), and analysis of multiple SNPs (haplotype and epistasis analysis). Permutation test and related tests (sum statistic and truncated product) are also implemented.
A Bayesian approach to incorporate a set of important covariates into the fdr under a heteroscedastic model, where the probability of non-null status and the distribution of the test statistic under the non-null hypothesis are both modulated by covariates. The primary advantage of our methodology over traditional fdr methods is that two SNPs with the same z score can have different values of cmfdr if one is in a more enriched category than the other. Hence, by using SNP annotations to modulate fdr, more SNPs can be discovered for a given level of fdr control. In other words, methods such as cmfdr that break the exchangeability assumption are potentially more powerful than traditional fdr methods that assume exchangeability.
A high-dimensional variable selection method for survival analysis by improving the existing variable selection methods in several aspects. First, we have developed a computationally feasible variable selection approach for high-dimensional survival analysis. Second, we have designed a random sampling scheme to improve the control of the false discovery rate. Finally, the proposed framework is flexible to accommodate complex data structures. Comparisons between the proposed method and the commonly used univariate and Lasso approaches for variable selection reveal that the proposed method yields fewer false discoveries.
A permutation tool. PBOOST is based on GPU with highly reliable P-value estimation. In terms of speed, PBOOST completed 107 permutations for a single SNP pair from the Wellcome Trust Case Control Consortium (WTCCC) genome data (Wellcome Trust Case Control Consortium, 2007) within 1 min on a single Nvidia Tesla M2090 device, while it took 60 min in a single CPU Intel Xeon E5-2650 to finish the same task.
A software tool that estimates the p-value of a gene using information on annotation, single marker GWA results and genotype. The software tool is species and annotation independent, fast, highly parallelized, and ready for high-density marker studies.
A method for dividing the p-values into multiple groups and combine it at the group level. GCP integrates the significance values at different levels, and the power is improved. GCP can effectively control the type I error rates and have additional power over the existing methods – the power increase can be as high as over 50% under some situations.
A hybrid approach that includes the principal components (PCs) of the genotype matrix as fixed effects in FaST-LMM Select. PC-Select leverages the advantages of the FaST-LMM Select framework while correcting for population stratification. The two main steps of FaST-LMM Select are ranking SNPs by linear regression P-values to form the genetic relationship matrix (GRM) with the top-ranked SNPs and then calculating association statistics in a mixed-model framework, using this GRM. We used the top five PCs as fixed effects in both of these steps. As a result, PC-Select yields noninflated test statistics in the presence of population stratification and maintains high power to detect causal SNPs.
Comprises a family of statistical methods designed to identify weak associations in genome-wide association studies that are not detectable by conventional analytical methods. Puma uses a regularized multiple regression in a penalized maximum likelihood framework using a generalized linear model in order to simultaneously consider tens to hundreds of thousands of genetic markers in a single statistical model. These methods are able to consider both case/control and continuous phenotypes and are optimized to efficiently handle very large datasets.
Enables, in a combinatorial way, the analysis of single nucleotide polymorphism (SNP) genotype calls, copy numbers, polymorphic copy number variations (CNVs) and gene expression. SNPExpress is available for use with Affymetrix DNA mapping arrays, Illumina HumanHap550 Genotyping BeadChip and Affymetrix GeneChips. The software facilitates the identification of biologically and clinically relevant entities. It can be useful to genome-wide studies by providing an integrated view of data from DNA mapping and mRNA expression arrays.
Implements a method for computing posterior association probabilities of single-nucleotide polymorphisms (SNP) (and other quantities) in genome-wide association studies (GWAS) using Bayesian variable selection and model averaging. Bmagwa considers simultaneously all available variants for inclusion as predictors in a linear genotype-phenotype mapping and averages over the uncertainty in the variable selection. This leads to naturally interpretable summary quantities on the significances of the variants and their contribution to the genetic basis of the studied trait.