Single nucleotide variant identification software tools | High-throughput sequencing data analysis
With the advent of relatively affordable high-throughput technologies, DNA sequencing of cancers is now common practice in cancer research projects and will be increasingly used in clinical practice to inform diagnosis and treatment. Somatic (cancer-only) single nucleotide variants (SNVs) are the simplest class of mutation, yet their identification in DNA sequencing data is confounded by germline polymorphisms, tumour heterogeneity and sequencing and analysis errors.
Focuses on variant discovery and genotyping. GATK provides a toolkit, developed at the Broad Institute, composed of several tools and able to support projects of any size. The application compiles an assortment of command line allowing one to analyze of high-throughput sequencing (HTS) data in various formats such as SAM, BAM, CRAM or VCF. The website includes multiple documentation for guiding users.
Allows users to interact with high-throughput sequencing data. SAMtools permits the manipulation of alignments in the SAM/BAM/CRAM formats: reading, writing, editing, indexing, viewing and converting SAM/BAM/CRAM format. It limits the mapping quality of reads with excessive mismatches and applies base alignment quality to fix alignment errors. This tool can sort and merge alignments, remove polymerase chain reaction (PCR) duplicates or generate per-position information.
Identifies somatic variation in tumor genomes. SMuFin uses direct comparison with the corresponding normal samples to detect in a single run somatic single-nucleotide variants (SNV) and structural variants such as insertions, deletions, inversion and translocations of any size. This software allows to describe at base pair resolution complex scenarios of chromosomal rearrangements like chromoplexy and chromothripsis.
Automates somatic variant refinement. DeepSVR is a model that performs systematized and standardized somatic variant refinement using a machine learning approach. It was built on a training dataset of 41,000 variants from 21 studies, with 440 cases derived from nine cancer subtypes. It aims to reduce a bottleneck in cancer genomic analysis while improving reproducibility and inter-lab comparability in genomic studies and in clinical settings.
Aligns reads to a reference genome and calls single nucleotide variations (SNVs). marginAlign suits for Oxford Nanopore Reads and includes in its package a short-read aligner; marginCaller, a program that calls SNVs and marginStats that calculates simple statistics such as alignment identity, coverage, insertion or deletion rates on SAM file.
Identifies DNA modifications in the case where 5-mC can be distinguished from cytosine by careful analysis of the electrical current signals. Nanopolish computes the log-likelihood ratio between an unmethylated version of a reference genome substring and a version that contained at least one ‘CG’ dinucleotide. It employs a signal-level hidden Markov model (HMM) method to work. This tool can increase the consensus accuracy around homopolymers.
A statistical method for detecting and genotyping single-nucleotide variants in single-cell data. Monovar exhibited superior performance over standard algorithms on benchmarks and in identifying driver mutations and delineating clonal substructure in three different human tumor data sets. Monovar is capable of analyzing large-scale data sets and handling different whole-genome amplification (WGA) protocols, and thus it is well suited for addressing the growing need for accurate single-cell DNA variant detection.
A software tool for analyzing de novo mutations from familial and somatic tissue sequencing data. DeNovoGear uses likelihood-based error modeling to reduce the false positive rate of mutation discovery in exome analysis and fragment information to identify the parental origin of germ-line mutations.
A platform-independent mutation caller for targeted, exome, and whole-genome resequencing data generated on Illumina, SOLiD, Life/PGM, Roche/454, and similar instruments. The newest version, VarScan 2, is written in Java, so it runs on most operating systems. It can be used to detect different types of variation: 1) germline variants (SNPs and indels) in individual samples or pools of samples, 2) multi-sample variants (shared or private) in multi-sample datasets (with mpileup), 3) somatic mutations, LOH events, and germline variants in tumor-normal pairs and 4) somatic copy number alterations (CNAs) in tumor-normal exome data.
A versatile machine learning approach that uses Random Forest classification models to accurately call somatic variants in low-depth sequencing data. SNooPer uses a subset of variant positions from the sequencing output for which the class, true variation or sequencing error, is known to train the data-specific model. During the training phase, using a real dataset of 40 childhood acute lymphoblastic leukemia patients, it was shown how the SNooPer algorithm is not affected by low coverage or low variant allele frequencies, and can be used to reduce overall sequencing costs while maintaining high specificity and sensitivity to somatic variant calling.
Offers a platform for population-level analyses. dDocent is an open-source software dedicated to individually barcoded restriction-site associated DNA sequencing (RADseq) data processing. The application employs data reduction techniques and interact with other programs to propose features such as de novo assembly of RAD loci, single nucleotides polymorphisms (SNPs) and indel calling as well as quality trimming or baseline data filtering.
Discovers variants in low-mappability regions. GeneticThesaurus consists of a personalized filtering strategy taking thesaurus annotations into account. It improves the detection of DNA changes across matched samples. This tool allows users to characterize the landscape of mutations in sequence-similar regions of the human genome using short-read sequencing data. It is useful for studying genomics of cancer and of familial diseases.
Provides analysis of germline variation in small cohorts and somatic variation in tumor/normal sample pairs. Strelka is a variant calling method intending to increase the efficiency of variant calling for both germline and somatic analysis. It includes a germline caller which supplies read-backed phasing, an alignment-based haplotyping approach at each variant locus, and the ability of analyzing input sequencing data using a mixture-model indel error estimation method.
Investigates non-coding elements and protein-coding genes. ncdDetect implements a two-step algorithm in which sample-specific calculations are followed by computations across all samples in the dataset. It is useful for understanding the underlying mechanisms of tumorigenesis. This tool can model the heterogeneous neutral background mutation-rate taking genomic annotations known to correlate with the mutation rate into account.
Extracts causative variants in familial and sporadic genetic diseases. VariantMaster implements a methodology to evaluate the status (presence or absence) of a variant in familial or case-control contexts. The software allows users to identify causative variants in familial, sporadic germline, and somatic genetic disorders, including cancers. It also allows for the search of causative variants in one or more recurrently mutated genes in a pool of unrelated individuals sharing the same phenotype.
Estimate sample composition accurately or the level of contamination of a disease sample without genotyping. Virmid is a probabilistic method for Single Nucleotide Variation (SNV) calling. This application increases genotyping accuracy, especially somatic mutation profiling, by rigorously integrating the sample composition parameter into the SNV calling model. The robustness of this application makes it applicable for identifying mutations in other challenging cases.
A somatic point mutation caller for tumor-normal paired samples in next-generation sequencing (NGS) data. MuSE models the evolution of the reference allele to the allelic composition of the matched tumor and normal tissue at each genomic locus. To improve overall accuracy, we further adopt a sample-specific error model to identify cutoffs, reflecting the variation in tumor heterogeneity among samples.
An accurate somatic mutation detection pipeline implementing a stochastic boosting algorithm to produce highly accurate somatic mutation calls for both single nucleotide variants and small insertions and deletions. The workflow currently incorporates five state-of-the-art somatic mutation callers, and extracts over 70 individual genomic and sequencing features for each candidate site. A training set is provided to an adaptively boosted decision tree learner to create a classifier for predicting mutation statuses.
Performs variant discovery on Amazon's Web Service (AWS) cloud or on local high-performance computing clusters. GenomeVIP is a genomics analysis pipeline for cloud computing with germline and somatic calling on amazon’s cloud. It provides a collection of analysis tools and computational frameworks for streamlined discovery and interpretation of genetic variants. The server and runtime environments can be customized, updated, or extended.
Topics (12): WGS analysis, Homo sapiens, Abnormalities, Drug-Induced, Nervous System Malformations, Malformations of Cortical Development, Nervous System Malformations, Malformations of Cortical Development, Nervous System Malformations, Malformations of Cortical Development, Genetic Diseases, X-Linked, Nervous System Malformations, Malformations of Cortical Development