Duplication identification software tools | High-throughput sequencing data analysis
Segmental duplication or low-copy repeat. A segment of DNA >1 kb in size that occurs in two or more copies per haploid genome, with the different copies sharing >90% sequence identity. They are often variable in copy number and can therefore also be CNVs.
Allows structural variant (SV) discovery. LUMPY is a general probabilistic SV discovery framework that integrates multiple SV detection signals, including those generated from read alignments or prior evidence. The software is based upon a general probabilistic representation of an SV breakpoint that allows any number of alignment signals to be integrated into a single discovery process. It can detect SV from multiple alignment signals in files from one or more samples. A simplified wrapper for standard analyses, LUMPY Express, can also be executed.
Retrieves balanced and unbalanced forms of structural variation, such as deletions, tandem duplications, inversions and translocations. DELLY is based on a combination of short-range and long-range paired-end mapping and split-read analysis. It is useful for massively parallel sequencing (MPS) data from various sources, including deep whole-genome sequencing data and low-pass mate-pair sequencing data with longer inserts.
A data processing pipeline for copy number variations and aberrations (CNVs and CNAs) from next generation sequencing (NGS) data. The package supplies functions to convert BAM files into read count matrices or genomic ranges objects, which are the input objects for cn.MOPS. It models the depths of coverage across samples at each genomic position. Therefore, it does not suffer from read count biases along chromosomes. Using a Bayesian approach, cn.MOPS decomposes read variations across samples into integer copy numbers and noise by its mixture components and Poisson distributions, respectively.
Identifies genomic structural variations from paired-end and mate-pair sequencing data. SVDetect isolates and predicts intra- and inter-chromosomal rearrangements from paired-end/mate-pair sequencing furnished by the high-throughput sequencing technologies. This software proceeds first by collecting all pairs that are suspected to come from the same structural variant (SV). It then employs a sliding-window strategy to detect all groups of pairs sharing similar genomic location.
A versatile variant caller for both DNA- and RNA-sequencing data. VarDict contains many features that are distinct from other variant callers, including linear performance to depth, intrinsic local realignment, built-in capability of de-duplication, detection of polymerase chain reaction (PCR) artifacts, accepting both DNA- and RNA-seq, paired analysis to detect variant frequency shifts alongside somatic and loss of heterozygosity (LOH) variant detection and structural variant (SV) calling. VarDict facilitates application of next-generation sequencing in cancer research, enabling researchers to use one tool in place of an alternative computationally expensive ensemble of tools.
A high performance robust tool and library for working with SAM, BAM and CRAM sequence alignment files; the most common file formats for aligned next generation sequencing (NGS) data. Sambamba is a faster alternative to samtools that exploits multi-core processing and dramatically reduces processing time. Sambamba is being adopted at sequencing centers, not only because of its speed, but also because of additional functionality, including coverage analysis and powerful filtering capability.
Enables discovering and genotyping structural variations using sequencing data. Genome STRiP performs discovery and genotyping of copy number variations (CNVs) by analyzing the data from many samples simultaneously in a population-based framework. The software can discover polymorphisms and produce genotypes. It can be used to find novel structural variations or to genotype known variants in new samples.