Deletion identification software tools | High-throughput sequencing data analysis
Advances of next generation sequencing technologies and availability of short read data enable the detection of structural variations (SVs). Deletions, an important type of SVs, have been suggested in association with genetic diseases. There are three types of deletions: blunt deletions, deletions with microhomologies and deletions with microsinsertions. The last two types are very common in the human genome, but they pose difficulty for the detection. Furthermore, finding deletions from sequencing data remains challenging.
Identifies the structural variation (SV) by whole genome de novo assembly. SOAPsv aims to show that SVs reports for a greater fraction of diversity between individuals than do single nucleotide polymorphisms (SNPs). This software also demonstrates that de novo assembly can detect SVs of a large range of lengths. The SV maps of human genomes allows to initially describe the genomic patterns of SVs and their relationship with a variety of genomic features.
Identifies somatic variation in tumor genomes. SMuFin uses direct comparison with the corresponding normal samples to detect in a single run somatic single-nucleotide variants (SNV) and structural variants such as insertions, deletions, inversion and translocations of any size. This software allows to describe at base pair resolution complex scenarios of chromosomal rearrangements like chromoplexy and chromothripsis.
Enables discovering and genotyping structural variations using sequencing data. Genome STRiP performs discovery and genotyping of copy number variations (CNVs) by analyzing the data from many samples simultaneously in a population-based framework. The software can discover polymorphisms and produce genotypes. It can be used to find novel structural variations or to genotype known variants in new samples.
To characterize the mutational spectrum of somatic SVs in cancer, it is important to identify both simple (e.g., deletion, insertion, and inversion) and complex SVs at base-pair resolution. Meerkat predicts both germline and somatic SVs directly from short read data, focusing on complex events.
Provides systematic isolation of targeted deletions in the D. melanogaster genome. FRT Deletion Hunter can be useful to improve the Stock Center deletion collection. It is able to generate small custom deletions with predictable endpoints throughout the genome. The tool permits the user to specify the polytene segment, the genomic coordinates, a particular gene deletion and to specify which deletion to apply to the gene.
A tool designed for efficient and accurate variant-detection in high-throughput sequencing data. By using local realignment of reads and local assembly it achieves both high sensitivity and high specificity. Platypus can detect SNPs, MNPs, short indels, replacements and (using the assembly option) deletions up to several kb. It has been extensively tested on whole-genome, exon-capture, and targeted capture data.
A Perl/C++ package that provides genome-wide detection of structural variants from next generation paired-end sequencing reads. BreakDancer sensitively and accurately detected indels ranging from 10 base pairs to 1 megabase pair that are difficult to detect via a single conventional approach.
Assists users to infer an underlying genotype at each structural variants (SVs). SVTyper is a Bayesian likelihood algorithm that can operate on copy-neutral events such as inversions and translocations as well as copy number variants (CNVs). It permits the production of SV genotypes, useful for meaningful variant interpretation, as well as quantitative estimates of breakpoint allele frequencies that allow inference of the fraction of tumor cells that carry a particular variant.
A tool to generate local assemblies of breakpoints genome-wide. NovoBreak is an algorithm used in cancer genomic studies to discover structural variants (both somatic and germline) breakpoints in whole-genome sequencing data. Assemblies realized by novoBreak are based on clusters of reads which share a set of short nucleotide stretches of length K (K-mers) present in a subject genome but not in the reference genome or control data.
Integrates prior knowledge about the characteristics of structural variants (SVs). forestSV is a statistical learning approach, based on Random Forests (RFs) that leads to improved discovery in high throughput sequencing (HTS) data. This application offers high sensitivity and specificity coupled with the flexibility of a data-driven approach. It is particularly well suited to the detection of rare variants because it is not reliant on finding variant support in multiple individuals.
A computational tool for copy number variants (CNV) detection in whole human genome sequence data using read depth (RD) coverage. CNV detection is based on the Event-Wise Testing (EWT) algorithm. The read depth coverage is estimated in non-overlapping intervals (100bp Windows) across an individual genome based on the pileup generated by SAMTools.
Detects genotype insertions and deletions from paired-end reads. CTK is a suite of tools for next-generation sequencing (NGS) data analysis and is based on an internal segment size approach to discover indel variation from paired-end read data. It contains also, among others, a long-indel-aware read mapper (LASER), a BAM converter to a list of alignment pairs with prior probabilities and a split feature by chromosome.
Allows structural variant (SV) discovery. LUMPY is a general probabilistic SV discovery framework that integrates multiple SV detection signals, including those generated from read alignments or prior evidence. The software is based upon a general probabilistic representation of an SV breakpoint that allows any number of alignment signals to be integrated into a single discovery process. It can detect SV from multiple alignment signals in files from one or more samples. A simplified wrapper for standard analyses, LUMPY Express, can also be executed.
Serves for delineating de novo copy number deletions simultaneously across multiple trios from targeted sequencing (TS) data. MDTS can generate any false positives and employ non-uniformly sized bins based on read depth instead of using uniform, non-overlapping bins defined by the number of nucleotide base pairs. Moreover, it exploits the trio design by using a “minimum distance” statistic to quantify differences in read depths between the offspring and the parents, thereby reducing shared sources of technical variation.
A computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data. The package is composed of three modules, PEMer workflow, SV-Simulation and BreakDB. PEMer workflow is a sensitive software for detecting SVs from paired-end sequence reads. SV-Simulation randomly introduces SVs into a given genome and generates simulated paired-end reads from the ‘novel’ genome. Subsequent analysis with PEMer workflow on the simulated reads can facilitate parameterize PEMer workflow. BreakDB is a web accessible database developed to store, annotate and dsplay SV breakpoint events identified by PEMer and from other sources.
Identifies structural variant (SV) breakpoint junctions by clustering split reads. NanoSV first orders all mapped segments of each split read by their positions within the originally sequenced read. This tool utilizes split read mapping to discover all defined types of SVs. It finishes by gathering evidence form different reads supporting the same candidate breakpoint junction. NanoSV suits for Nanopore and Pacific Biosciences data.
Offers a method for the detection of structural variants (SVs). GASVPro proposes a probabilistic model, able to consider inversions and reciprocal translocations, which is based on a merging of paired-read and read depth signals. It furnishes a method able to handle reads with multiple possible alignments. This program can report: (i) uncertainty in predicted breakpoint and if a generic breakend can be classified as an homozygous or an heterozygous variant.
Allows identification of genomic rearrangements. GRIDSS is a module software suite containing tools which performs genome-wide break-end assembly prior to variant calling using a positional de Bruijn graph assembler. The GRIDSS pipeline comprises three distinct stages: extraction, assembly, and variant calling. The software identifies non-template sequence insertions, microhomologies and large imperfect homologies, and supports multi-sample analysis.
Identifies regions of the genome suspected to harbor a complex event. SVelter then resolves the structure by iteratively rearranging the local genome structure, in a randomized fashion, with each structure scored against characteristics of the observed sequencing data. SVelter is able to accurately reconstruct complex chromosomal rearrangements when compared to well-characterized genomes that have been deeply sequenced with both short and long reads. SVelter is able to interrogate many different types of rearrangements, including multi-deletion and duplication-inversion-deletion events as well as distinct overlapping variants on homologous chromosomes.