1 - 7 of 7 results

GATK / Genome Analysis ToolKit

star_border star_border star_border star_border star_border
star star star star star
Focuses on variant discovery and genotyping. GATK provides a toolkit, developed at the Broad Institute, composed of several tools and ables to support projects of any size. The application compiles an assortment of command line allowing one to analyze of high-throughput sequencing (HTS) data in various formats such as SAM, BAM, CRAM or VCF. The website includes multiple documentation for guiding users.

GATK-Queue / Genome Analysis Toolkit-Queue

A command-line scripting framework for defining multi-stage genomic analysis pipelines combined with an execution manager that runs those pipelines from end-to-end. Often processing genome data includes several steps to produces outputs, for example our BAM to VCF calling pipeline include among other things: local realignment around indels; emitting raw SNP calls; emitting indels, masking the SNPs at indels; annotating SNPs using chip data; labeling suspicious calls based on filters; creating a summary report with statistics. Running these tools one by one in series may often take weeks for processing, or would require custom scripting to try and optimize using parallel resources. With a Queue script users can semantically define the multiple steps of the pipeline and then hand off the logistics of running the pipeline to completion. Queue runs independent jobs in parallel, handles transient errors, and uses various techniques such as running multiple copies of the same program on different portions of the genome to produce outputs faster.

GATK VariantRecalibrator

Builds a recalibration model to score variant quality for filtering purposes. VariantRecalibrator performs the first pass in a two-stage process called VQSR; the second pass is performed by the ApplyRecalibration tool. In brief, the first pass consists of creating a Gaussian mixture model by looking at the distribution of annotation values over a high quality subset of the input call set, and then scoring all input variants according to the model. The second pass consists of filtering variants based on score cutoffs identified in the first pass.

RG / ReliableGenome

Calculates concordant and discordant regions with respect to a set of surveyed variant calling (VC) pipelines. ReliableGenome combines call sets derived by multiple pipelines from arbitrary numbers of datasets and interpolates expected concordance for genomic regions without data. Our method can be applied to arbitrary VC pipelines and the resulting genomic partitions can be used for variant filtering, annotation and prioritization or for focusing computational resources on hard-to-analyse regions of the genome.


Automatically integrates variant calling pipelines into a better performing overall model that also predicts accurate variant probabilities. VariantMetaCaller uses Support Vector Machines to combine multiple information sources generated by variant calling pipelines and estimates probabilities of variants. This novel method had significantly higher sensitivity and precision than the individual variant callers in all target region sizes, ranging from a few hundred kilobases to whole exomes. We also demonstrated that VariantMetaCaller supports a quantitative, precision based filtering of variants under wider conditions. Specifically, the computed probabilities of the variants can be used to order the variants, and for a given threshold, probabilities can be used to estimate precision. Precision then can be directly translated to the number of true called variants, or equivalently, to the number of false calls, which allows finding problem-specific balance between sensitivity and precision.