1 - 23 of 23 results

GATK-Queue / Genome Analysis Toolkit-Queue

A command-line scripting framework for defining multi-stage genomic analysis pipelines combined with an execution manager that runs those pipelines from end-to-end. Often processing genome data includes several steps to produces outputs, for example our BAM to VCF calling pipeline include among other things: local realignment around indels; emitting raw SNP calls; emitting indels, masking the SNPs at indels; annotating SNPs using chip data; labeling suspicious calls based on filters; creating a summary report with statistics. Running these tools one by one in series may often take weeks for processing, or would require custom scripting to try and optimize using parallel resources. With a Queue script users can semantically define the multiple steps of the pipeline and then hand off the logistics of running the pipeline to completion. Queue runs independent jobs in parallel, handles transient errors, and uses various techniques such as running multiple copies of the same program on different portions of the genome to produce outputs faster.

MAQ / Mapping and Assembly with Quality

Builds mapping assemblies from short reads generated by the next-generation sequencing machines. Maq is particularly designed for Illumina-Solexa 1G Genetic Analyzer, and has preliminary functions to handle ABI SOLiD data. Maq first aligns reads to reference sequences and then calls the consensus. At the mapping stage, maq performs ungapped alignment. For single-end reads, maq is able to find all hits with up to 2 or 3 mismatches, depending on a command-line option; for paired-end reads, it always finds all paired hits with one of the two reads containing up to 1 mismatch. At the assembling stage, maq calls the consensus based on a statistical model.


A platform-independent mutation caller for targeted, exome, and whole-genome resequencing data generated on Illumina, SOLiD, Life/PGM, Roche/454, and similar instruments. The newest version, VarScan 2, is written in Java, so it runs on most operating systems. It can be used to detect different types of variation: 1) germline variants (SNPs and indels) in individual samples or pools of samples, 2) multi-sample variants (shared or private) in multi-sample datasets (with mpileup), 3) somatic mutations, LOH events, and germline variants in tumor-normal pairs and 4) somatic copy number alterations (CNAs) in tumor-normal exome data.


A statistical tool for calling common and rare variants in analysis of pool or individual next-generation sequencing data. SNVer reports one single overall p-value for evaluating the significance of a candidate locus being a variant, based on which multiplicity control can be obtained. Loci with any (low) coverage can be tested and depth of coverage will be quantitatively factored into final significance calculation. SNVer runs very fast, making it feasible for analysis of whole-exome sequencing data, or even whole-genome sequencing data.


A sensitive and robust approach for calling single-nucleotide variants (SNVs) from high-coverage sequencing datasets, based on a formal model for biases in sequencing error rates. LoFreq adapts automatically to sequencing run and position-specific sequencing biases and can call SNVs at a frequency lower than the average sequencing error rate in a dataset. LoFreq’s robustness, sensitivity and specificity were validated using several simulated and real datasets (viral, bacterial and human) and on two experimental platforms (Fluidigm and Sequenom).

SPLINTER / Short indel Prediction by Large deviation Inference and Nonlinear True frequency Estimation by Recursion

Detects and quantifies short indels and substitutions in large pools. SPLINTER allows accurate detection and quantification of short insertions, deletions, and substitutions by integrating information from the synthetic DNA library to tune SPLINTER and quantify specificity and sensitivity for every experiment in order to accurately detect and quantify indels and substitutions.


A variant detector and graphical alignment viewer for next-generation sequencing data in the SAM/BAM format, which is capable of pooling data from multiple source files. The variant detector takes advantage of SAM-specific annotations, and produces detailed output suitable for genotyping and identification of somatic mutations. The assembly viewer can display reads in the context of either a user-provided or automatically generated reference sequence, retrieve genome annotation features from a UCSC genome annotation database, display histograms of non-reference allele frequencies, and predict protein-coding changes caused by SNPs.

CRISP / Comprehensive Read analysis for Identification of Single Nucleotide Polymorphisms

Detects SNPs and short indels from high-throughput sequencing of pooled DNA samples. CRISP has been primarily developed to analyze data from "artificial" DNA pools, i.e. pools generated by equi-molar pooling of DNA from multiple individual samples. CRISP leverages sequence data from multiple such pools to detect both rare and common variants. Note that the method is not designed for variant detection from a single pool. CRISP was developed for targeted disease association studies in humans but may work well for other applications.


A fast and easy desktop GUI tool for the identification of genomic variants from pooled sequencing and individual sequencing data. Using SNVerGUI, users can perform sophisticated variant detection by simply configuring several parameters in a friendly graphical user interface. Compared with other methods for variant calling, our approach is unique in that it is applicable to both individual and pooled sequencing data. SNVerGUI supports commonly used input and output file formats that allows SNVerGUI to be seamlessly integrated into common NGS data analysis pipelines.

CoVaCS / Consensus Variant Calling System

Enables genotyping and variant annotation of resequencing data produced by second generation next generation sequencing (NGS) technologies. CoVaCS is an automated system that provides tools for variant calling and annotation along with a pipeline for the analysis of whole genome shotgun (WGS), whole exome sequencing (WES) and targeted resequencing data (TGS). The software allows non-specialists to perform all steps from quality trimming to variant annotation.


An integrated tool set and automated workflow to allow robust and reliable identification of sequence variants present in a subset of sequences within a tagged input DNA sample. DeepSNVMiner makes available the analysis procedure required to support SafeSeqs and similar unique sequence identifier tagged sequence datasets. DeepSNVMiner has been built to allow easy automation and reproducibility and makes this technique available to a wide range of applications. The workflow remains flexible such that it may be customised to variants of the data production protocol used, and supports reproducible analysis through detailed logging and reporting of results.

OPENDoRM / Optimization of Pooled Experiments in NGS for Detection of Rare Mutations

Assists in planning next generation sequencing (NGS) experiments. OPENDoRM can be splits into four components: (i) global settings for the NGS experiment; (ii) data processing; (iii) visual exploration; and (iv) data interpretation. It is able to: (i) describe the pooling of high-throughput generated data using four different algorithms; (ii) identify the optimal number of patients in each pool with respect to minimization of the cost of the experiment; (iii) generate easy-to-read reports and charts for better understanding the planning of the experiments.