1 - 50 of 182 results

GMDR / Generalized Multifactor Dimensionality Reduction

Identifies gene-by-gene and gene-by-environment interactions. GMDR allows users to perform analyses for detection of multifactor interactions with large-scale data. The software implements a set of methods on the analysis of interactions with diverse study designs such as case-control design, family based design or a combination of both. GMDR also provides features such as large-scale data management and preprocessing. It can assist in revealing genetic architecture in terms of gene-gene interactions underlying complex traits.

MAGNAMWAR / Mono-Associated GNotobiotic Animals Metagenome-Wide Association R

Enables bacterial genome-wide association (BGWA) to predict bacterial genes that influence organismal traits. MAGNAMWAR can be used to identify bacterial determinants of D. fruit fly nutrition and to define the genetic relationship between genome-sequenced bacteria or metagenomes and any organismal phenotype. Moreover, the software simplifies the pre-formatting and analysis steps, the graphical presentation of the data, incorporates bacterial population structure into analyses, and permits the use of additional statistical tests.


Verifies sample identities from FASTQ, BAM or VCF files. NGSCheckMate uses a model-based method to compare allele read fractions at known single-nucleotide polymorphisms (SNPs), considering depth-dependent behavior of similarity metrics for identical and unrelated samples. It is effective for a variety of data types, including exome sequencing, whole-genome sequencing, RNAseq, ChIP-seq, targeted sequencing and single-cell whole-genome sequencing, with a minimal requirement for sequencing depth. The tool can be used as a quality control step in next-generation sequencing (NGS) studies.


Creates DNA barcode sets capable of correcting insertion, deletion, and substitution errors. DNABarcodes generates sets from a few basic input parameters (e.g. length, distance metric, minimum distance, chemical properties). It satisfies the specifics of most particular experimental demands in de novo design of barcodes. Additionally, the package allows analysing existing sets of DNA barcodes as well as the generation of subsets of those existing sets to improve their error correction and detection properties. Finally, reads that start with a (possibly mutated) barcode can be demultiplexed, i.e., assigned to their original reference barcode. DNABarcodes was designed for speed, versatility, provable correctness and large set sizes.


Integrates methods to estimate heritability of gene expression based on next-generation sequencing (NGS) data, performs hypothesis testing, and provides confidence intervals. HeritSeq is an R package that provides the generation of simulated sequencing data under either negative binomial or compound Poisson mixed models. Variance partition coefficients (VPC) are computed using linear mixed effects and generalized linear mixed effects models. Compound Poisson and negative binomial models are included.


Provides statistical means to infer the strength of association between the rate of sequence evolution in a given genomic region and a phenotype of interest. TraitRateProp is a web server that allows testing the hypothesis of an association between the sequence evolutionary rate and the phenotypic trait. It then computes per-site predictions, where high scores indicate sequence sites whose rate is more likely to be in association with the phenotypic trait. In case where a 3D structure information of the protein is provided, the site-specific scores are mapped onto this structure, thus allowing the detection of putatively functional sequence sites that are spatially close to each other in 3D space.

GNOVA / GeNetic cOVariance Analyzer

Estimates annotation-stratified genetic covariance between traits using genome-wide association studies (GWAS) summary statistics. GNOVA provides accurate covariance estimates and powerful statistical inference that are robust to linkage disequilibrium (LD) and sample overlap. It was applied to estimate genetic correlations for 50 complex traits using publicly available GWAS summary statistics. The results show that the tool is more powerful when genetic correlation is moderate comparing to LD score regression (LDSC).

AKT / Ancestry and Kinship Toolkit

Detects related samples, characterises sample ancestry, calculates correlation between variants, check Mendel consistency and performs data clustering. AKT is a statistical genetics tool for analysing large cohorts of whole-genome sequenced samples. It brings together the functionality of many state-of-the-art methods, with a focus on speed and a unified interface. AKT will help in cases where meta-data about the samples may be missing or unreliable, allowing easy inference of ancestry and relatedness from the data itself.

MeRP / Mendelian Randomization Pipeline

Facilitates rapid, causal inference analysis through automating key steps in developing and analyzing genetic instruments obtained from publicly available data. MeRP uses the National Human Genome Research Institute catalog of associations to generate instrumental variable trait files and provides methods for filtering of potential confounding associations as well as linkage disequilibrium. MeRP generates estimated causal effect scores via a MR-score analysis using summary data for disease endpoints typically found in the public domain.

PhyResSE / Phylo-Resistance-Search-Engine

Delineates strain lineage and antibiotic resistance. PhyResSE is a web-based tool designed to enable nonspecialized users to extract phylogenetic and resistance information from next-generation sequencing (NGS) data. The software enables the automated interpretation of Mycobacterium tuberculosis complex (MTBC) whole-genome sequencing (WGS) data for the identification of resistance-mediating variants and phylogenetic lineage classification. It opens the way for a wider application of WGS in the mycobacteriological laboratory for day-to-day use.


Provides a computationally efficient solution for screening general forms of compound heterozygosity (CH) alleles in densely imputed microarray or whole genome sequencing datasets. The generalized compound double heterozygosity (GCDH) test provides an improved power over single-SNP based methods in detecting the prevalence of CH in human complex phenotypes, offering an opportunity for tackling the missing heritability problem. CollapsABEL provides a user-friendly pipeline for genotype collapsing, statistical testing, power estimation, type I error control and graphics generation in the R language. CollapsABEL may help finding novel gene variants that explain additional proportions of the missing heritability for a wide range of human complex traits and diseases.


Consists as an deterministic graph-based method that is designed to find maximal constant-column biclusters in any given data matrix. GRAph-based Constant-cOlumn Biclustering (Gracob) is developed to discover co-fit genes from large growth phenotype profiling data sets. It takes advantage of the sparsity of biclusters and compared to the size of the input data matrix, the number of biclusters in the matrix is small. Gracob consists of three main phases: 1) the pre-processing phase, 2) the graph creation phase, and 3) the maximal clique finding phase.


Allows hierarchical genotype classification of clonal pathogens based on canonical single nucleotide polymorphisms (SNPs). CanSNPer is a genotype classification pipeline which stores information on canonical SNPs, extracts known canonical SNPs from a query draft sequence for classification of the pathogen isolate and can generate a visual representation of all SNPs in the canonical SNP tree for the query sequence. CanSNPer was designed to easily extract known canonical SNPs from a query draft sequence for classification of the pathogen isolate.

VIGoR / Variational Bayesian Inference for Genome-Wide Regression

Can be used for genome-wide regression. VIGoR implements seven regression methods: Bayesian lasso (BL), extended Bayesian lasso (EBL), weighted Bayesian shrinkage regression (wBSR), BayesB, BayesC, stochastic search variable selection (SSVS), and Bayesian mixture regression (MIX). It is optimized for genome-wide association mapping and whole-genome prediction which use a number of DNA markers as the explanatory variables. The tool can be applied into various problems where variable selection is required for huge data.


A workflow to visualize and explore ploidy levels in a newly sequenced genome, exploiting short read data. The analysis workflow consists in four steps: (i) it stores, in a data structure, for each position, the number of reads supporting different nucleotides; (ii) traverses the data structure, ignoring positions where a single nucleotide was observed and where the most frequent nucleotide had a frequency larger; (iii) putative allele percentages at each position are ordered from lowest to highest; (iv) finally, a histogram is generated to help on deciding which is the possible ploidy level of the organism under study. PloidyNGS is a useful tool that allows to visually assess the ploidy of organisms for which there is Next Generation Sequencing (NGS) short-read data available.


Allows the users to upload raw reads, obtained from different next generation sequencing (NGS) platforms, and get a fast estimation of the pathogenic potential of the bacteria they are studying. This web-server can analyze and identify genomic features associated with both pathogenicity and non-pathogenicity. PathogenFinder could be helpful in situations of possible bacterial outbreaks and follows the direction modern clinical microbiology and global epidemiology are taking driven by the revolution brought by high throughput DNA sequencing technologies.


Accepts called genotype data and jointly considers information on the X and Y chromosomes. seXY is a logistic regression model trained on both X chromosome heterozygosity and Y chromosome missingness, that consistently demonstrated >99.5% sex inference accuracy in cross-validation for 889 males and 5,361 females enrolled in prostate cancer and ovarian cancer genome-wide association studies (GWAS). Compared to PLINK, one of the most popular tools for sex inference in GWAS that assesses only X chromosome heterozygosity, seXY achieved marginally better male classification and 3% more accurate female classification.


A fast, flexible and easy-to-use tool for multi-category genetic association studies. By providing a wide range of use options, it allows the user to tailor their analysis to their data and experimental design. For instance, if the user wishes to carry out model selection at a risk variant, but wishes to account for the effect of a second risk variant in linkage disequilibrium, then Trinculo’s conditional regression option will handle this automatically. Other use cases, such a multinomial fine-mapping or ordinal logistic regression, are also included in the software.

EPS / Empirical Bayes approach to integrating Pleiotropy and tissue-Specific information

A statistical approach that can integrate pleiotropy information from GWAS data and tissue-specific gene expression data. Compared with some existing approaches, such as linear mixed models, which require genotype data at the individual level, EPS only requires summary statistics for analysis. EPS enables rigorous hypothesis testing of pleiotropy and tissue-specific risk gene expression patterns. All of the model parameters can be adaptively estimated from the developed expectation–maximization (EM) algorithm.


Identifies region-specific single nucleotide polymorphisms (SNPs) in which the polymorphic nucleotide creates a restriction fragment length polymorphism (RFLP) that can be readily assayed at the benchtop using restriction enzyme digestion of SNP-containing PCR products. SNP2RFLP permits user-defined queries that maximize the informative markers for a specific application, and allows to retrieve an adequate and manageable number of markers. This tool facilitates fine-mapping in a region containing a mutation of interest.