Simulation software tools | Genome-wide association study data analysis
The association analysis between single nucleotide polymorphisms (SNPs) and disease or endpoint in genome-wide association studies (GWAS) has been considered as a powerful strategy for investigating genetic susceptibility and for identifying significant biomarkers. The statistical analysis approaches with simulated data have been widely used to review experimental designs and performance measurements.
A simulation tool that can simulate sequence data with user-specified disease and quantitative trait models. SeqSIMLA can efficiently simulate sequence data with disease or quantitative trait models specified by the user. It is useful for evaluating statistical properties for new study designs and new statistical methods using NGS.
Mimics highly divergent DNA sequences and protein superfamilies. iSG simulates protein sequence evolution and builds realistic protein families. It utilizes multiple related root sequences to construct large simulated sequence space. This tool implements subsequence length constraints and lineage- and site-specific evolution. It is useful for testing the accuracy of multiple alignment methods or evolutionary hypotheses.
Infers the strength of association between the rate of sequence evolution in a given genomic region and a phenotype of interest. TraitRateProp can test the hypothesis of an association between the sequence evolutionary rate and the phenotypic trait. It performs per-site predictions to identify sequence sites whose rate is more likely to be in association with the phenotypic trait. This tool enables the recognition of putatively functional sequence sites that are spatially close to each other in 3D space.
Simulates multiple nearby disease single nucleotide positions (SNPs) on the same chromosome. HAPGEN is based on an alternative resampling method that uses a reference panel of haplotypes to generate a sample with patterns of linkage disequilibrium (LD) similar to those in the reference panel. It aims to be useful for searching disease models involving multiple disease SNPs within close proximity.
Aids users to simulate phenotypes and also to evaluate genome-wide association study (GWAS) methods. NaturalGWAS is a simulation model able to incorporate realistic features such as gene-by-environment interactions, where environment is derived from a climate database. This package implements an approach that performs tests using information on confounding variables.
Simulates phenotypes under different models including genetic variant effects and infinitesimal genetic effects, as well as correlated, non-genetic covariates and observational noise effects. PhenotypeSimulator is an R package that combines the phenotype components into a final phenotype while controlling for the proportion of variance explained by each of the components. Users can customize, for each effect component, the number of variables, their distribution and the design of their effect across traits.
A rapid moving-window algorithm to simulate genotype data for case-control or population samples from genomic SNP chips. For case-control data, GWAsimulator generates cases and controls according to a user-specified multi-locus disease model, and can simulate specific regions if desired. The program uses phased genotype data as input and has the flexibility of simulating genotypes for different populations and different genomic SNP chips.
Simulates region/gene-level genotype and phenotype data for complex and Mendelian traits for any given pedigree structure. RarePedSim can build conditionally or unconditionally on pedigree members' qualitative or quantitative phenotypes. It employs realistic population demographic models to mimic sequence data. This tool annotates variants sites with positions, allele frequencies and functionalities.
Simulates genome wide associations studies (GWAS) summary data. simGWAS is an R package which can modelize data in the context of case-control studies. The program can be used to reproduce summary data under a specific hypothesis about the location and magnitude of genetic effects or to assess output of fine-mapping applied to real data. It can be applied for any required causal model and set of odds ratios.
Generates data sets of families for use in Linkage and Association studies. SIMLA allows the user flexibility in specifying marker and disease placement, locus heterogeneity, disequilibrium between markers and disease loci. It allows simulation of linkage and association for multiple markers in extended pedigrees, nuclear families or in sets of unrelated cases and controls. The program will be useful for studying and comparing existing statistical tests, for developing new genetic linkage and association statistics, planning sample sizes for new studies, and interpreting genetic analysis results.
Provides a flexible simulating tool for pathway-based genome-wide association studies using real genetic data from the HapMap project or users. Pathsimu can simultaneously simulate multiple quantitative phenotypes and genome-wide genotype data under users assigned parameters, such as simulation times, genetic models (additive & epistatic genetic models), names or sizes of causal pathways, numbers and genetic effects (main & interactive effects) of disease genes, minor allele frequency ranges of causal SNPs of disease genes. Pathsimu can be used to develop novel multiple gene association study approaches, for instance evaluating the impact of genetic parameters on the power of pathway-based association testing approaches, and comparing the performance of different approaches under various parameter settings. Pathsimu is designed to output data with PLINK file format and be easily extendable.
Performs stochastic simulations of plant and animal breeding programs. AlphaSimR is the successor to the 'AlphaSim' software for breeding program simulation. Most simulations follow a general structure consisting of four steps: (1) creation of founder haplotypes, (2) setting simulation parameters, (3) modeling of the breeding program, and (4) examination of the results. It contains classes and functions allowing users to simulate a wide range of complex plant and animal breeding programs.
Simulates realistic samples for genome-wide association studies (GWAS). simuGWAS simulates populations that closely resemble the complex structure of the human genome, while allows the introduction of signals of natural selection. It can simulate realistic samples to evaluate the performance of a wide variety of statistical gene mapping methods for GWAS. Compared to other simulation methods, the tool simulates samples with existing genetic markers that resemble the human populations well in terms of marker allele frequency and linkage disequilibrium (LD) structure, with additional flexibility to simulate genomic regions with signals of natural selection.
A software tool to add a phenotype to genotypes generated in time-efficient coalescent simulations. Both qualitative and quantitative phenotypes can be generated and it is possible to partition phenotypic variation between additive effects and epistatic interactions between causal variants. The output formats of phenosim are directly usable as input for different GWAS tools. The applicability of phenosim is shown by simulating a genome-wide association study in Arabidopsis thaliana.
Integrates two forms of molecular data with multiple clinical endpoints. CC-PROMISE is an R package that combines canonical correlation (CC), a classical method used to evaluate the association of two multivariate data sets with one another and PROMISE, a method to integrate one form of molecular data with multiple endpoints. The software also includes a probe level analysis.
Aids users in simulating markers that mimic real ones in terms of allele frequencies and linkage disequilibria (LD). DHOEM is a simulation tool that exploits real data characteristics. It has been developed to assist simulation studies in quantitative genetics and selection. This method also allows the user to specify the desired marker density, with a user defined minor allele frequency (MAF) limit.
Allows simulation on population genetic data. GPOPSIM is based on mutation-drift equilibrium (MDE) model. It simulates genetic and phenotypic values for each individual in the current population using the genome structure generated in the historical population, the trait and quantitative trait loci (QTL) parameters. This tool is useful for data simulation in genetic or breeding researches that needs genomic and phenotypic data from a population.
Simulates genomic data with rare variants. SimuRare is a regression-based algorithm imputing rare variants in single nucleotide polymorphism (SNP) array data. It can achieve a resampling approach on any haplotype data to simulate samples including both common and rare SNPs. This method aims to improve realistic linkage disequilibrium (LD) and minor allele frequency (MAFs) maintaining.
Generates complex biallelic single nucleotide polymorphism (SNP) disease models for simulation studies. GAMETES rapidly and precisely generates random, pure, strict n-locus models with specified genetic constraints. It includes a simple dataset simulation strategy which may be utilized to rapidly generate an archive of simulated datasets for given genetic models. The tool could be employed to pursue theoretical characterization of genetic models and epistasis.
Assists users for performing simulations using available results of re-sequencing and genomics data. The crossword tool consists of a data-driven simulation language useful for designing genetic-mapping experiments and breeding strategies. This program offers the capability of simulating reads from a genomic sequence generated from a set of individuals in a simulation.