1 - 34 of 34 results


Integrates prior knowledge about the characteristics of structural variants (SVs). forestSV is a statistical learning approach, based on Random Forests (RFs) that leads to improved discovery in high throughput sequencing (HTS) data. This application offers high sensitivity and specificity coupled with the flexibility of a data-driven approach. It is particularly well suited to the detection of rare variants because it is not reliant on finding variant support in multiple individuals.


Formalizes and automates the identification of clusters of tandemly duplicated genes (CTDGs) by examining the physical distribution of individual members of families of duplicated genes across chromosomes. Application of CTDGFinder accurately identified CTDGs for many well-known gene clusters (e.g., Hox and beta-globin gene clusters) in the human, mouse and 20 other mammalian genomes. Examination of human genes showing tissue-specific enhancement of their expression by CTDGFinder identified members of several well-known gene clusters (e.g., cytochrome P450s and olfactory receptors) and revealed that they were unequally distributed across tissues. By formalizing and automating CTDG identification, CTDGFinder will facilitate understanding of CTDG evolutionary dynamics, their functional implications, and how they are associated with phenotypic diversity.

cn.MOPS / Copy number estimation by a Mixture Of PoissonS

A data processing pipeline for copy number variations and aberrations (CNVs and CNAs) from next generation sequencing (NGS) data. The package supplies functions to convert BAM files into read count matrices or genomic ranges objects, which are the input objects for cn.MOPS. It models the depths of coverage across samples at each genomic position. Therefore, it does not suffer from read count biases along chromosomes. Using a Bayesian approach, cn.MOPS decomposes read variations across samples into integer copy numbers and noise by its mixture components and Poisson distributions, respectively.


A versatile variant caller for both DNA- and RNA-sequencing data. VarDict contains many features that are distinct from other variant callers, including linear performance to depth, intrinsic local realignment, built-in capability of de-duplication, detection of polymerase chain reaction (PCR) artifacts, accepting both DNA- and RNA-seq, paired analysis to detect variant frequency shifts alongside somatic and loss of heterozygosity (LOH) variant detection and structural variant (SV) calling. VarDict facilitates application of next-generation sequencing in cancer research, enabling researchers to use one tool in place of an alternative computationally expensive ensemble of tools.


An algorithm that extend the univariate SLM to the multivariate case in order to detect recurrent shifts in the mean of multiple sequential processes. The resolution of JointSLM strictly depends on the signal to noise ratio (SNR) of the data: increasing the SNR of DOC data by reducing the sequencing error rate or augmenting the coverage of the sequencing experiments, will improve the performance of JointSLM in detecting small shifts in the signals. The JointSLM algorithm can be also used to analyse multiple tumour samples data for the discovery of recurrent copy number alterations.


This package for R can detect copy number aberrations by measuring the depth of coverage obtained by massively parallel sequencing of the genome. In contrast to other published methods, readDepth does not require the sequencing of a reference sample, and uses a robust statistical model that accounts for overdispersed data. It includes a method for effectively increasing the resolution obtained from low-coverage experiments by utilizing breakpoint information from paired end sequencing to do positional refinement. It can also be used to infer copy number using reads obtained from bisulfite sequencing experiments.


Provides a structural variation (SV) caller for long reads. Sniffles is mainly designed for PacBio reads, but also works on Oxford Nanopore reads. SV are larger events on the genome (e.g. deletions, duplications, insertions, inversions and translocations). Sniffles can detect all of these types and more such as nested SVs (e.g. inversion flanked by deletions or an inverted duplication). Furthermore, Sniffles incorporates multiple auto tuning functions to determine data set depending parameter to reduce the overall risk of falsely infer SVs.

DIGTYPER / Duplication and Inversion GenoTYPER

A method to genotype tandem duplications and inversions. DIGTYPER computes genotype likelihoods for a given inversion or duplication and reports the maximum likelihood genotype. In contrast to purely coverage-based approaches, DIGTYPER uses breakpoint-spanning read pairs as well as split alignments for genotyping, enabling typing also of small events. We tested our approach on simulated and on real data and compared the genotype predictions to those made by DELLY, which discovers SVs and computes genotypes. DIGTYPER compares favorable especially for duplications (of all lengths) and for shorter inversions (up to 300 bp). In contrast to DELLY, our approach can genotype SVs from data bases without having to rediscover them.


A tool designed to jointly detecting copy number variations (CNVs) from whole genome sequencing data in parent-offspring trios. TrioCNV models read depth signal with the negative binomial regression to accommodate over-dispersion and considered GC content and mappability bias. It leverages parent-offspring relationship to apply Mendelian inheritance constraint while allowing for the rare incidence of de novo events. It uses a hidden Markov model (HMM) by combining the two aforementioned models to jointly perform CNV segmentation for the trio.


Detects structural variants in cancer using whole genome sequencing data with or without matched normal control sample. SV-Bay does not only use information about abnormal read mappings but also assesses changes in the copy number profile and tries to associate these changes with candidate SVs. The likelihood of each novel genomic adjacency is evaluated using a Bayesian model. In its final step, SV-Bay annotates genomic adjacencies according to their type and, where possible, groups detected genomic adjacencies into complex SVs as balanced translocations, co-amplifications, and so on. A comparison of SV-Bay with BreakDancer, Lumpy, DELLY and GASVPro demonstrated its superior performance on both simulated and experimental datasets.


Calls structural variants (SVs) and indels from mapped paired-end sequencing reads. Manta is optimized for analysis of individuals and tumor/normal sample pairs, calling SVs, medium-sized indels and large insertions within a single workflow. The method is designed for rapid analysis on standard computer hardware: NA12878 at 50x genomic coverage is analyzed in less than 20 minutes on a 20 core server, most WGS tumor-normal analyses can be completed within 2 hours. Manta combines paired and split-read evidence during SV discovery and scoring to improve accuracy, but does not require split-reads or successful breakpoint assemblies to report a variant in cases where there is strong evidence otherwise. It provides scoring models for germline variants in individual diploid samples and somatic variants in matched tumor-normal sample pairs.

STRIDE / Species Tree Root Inference from gene Duplication Events

Identifies sets of well-supported gene duplication events from cohorts of gene trees. STRIDE is an algorithm developed to analyze these duplication events to infer a probability distribution over an unrooted species tree for the location of the true root. This package correctly identifies the community-accepted root of the majority of species trees and effectively captures uncertainty in root placement when data is limited or conflicting.


An integrated structural variation (SV) caller which leverages multiple orthogonal SV signals for high accuracy and resolution. MetaSV proceeds by merging SVs from multiple tools for all types of SVs. It also analyzes soft-clipped reads from alignment to detect insertions accurately since existing tools underestimate insertion SVs. Local assembly in combination with dynamic programming is used to improve breakpoint resolution. Paired-end and coverage information is used to predict SV genotypes.


A tool that achieves drastically higher detection accuracy than existing tools, both on simulated and real mate-pair sequencing datasets from the 1000 Human Genome project. Ulysses achieves high specificity over the complete spectrum of variants by assessing, in a principled manner, the statistical significance of each possible variant (duplications, deletions, translocations, insertions and inversions) against an explicit model for the generation of experimental noise. This statistical model proves particularly useful for the detection of low frequency variants. SV detection performed on a large insert Mate-Pair library from a breast cancer sample revealed a high level of somatic duplications in the tumor and, to a lesser extent, in the blood sample as well.

SHEAR / Sample Heterogeneity Estimation and Assembly by Reference

A tool for next-generation sequencing data analysis that predicts SVs, accounts for heterogeneous variants by estimating their representative percentages, and generates personal genomic sequences to be used for downstream analysis. By utilizing structural variant detection algorithms, SHEAR also offers improved performance in the form of a stronger ability to handle difficult structural variant types and improved computational efficiency.

GECKO-CSB / GEnome Comparison with K-mers Out-of-core-Computational Synteny Block

A package which detects and identifies blocks of large rearrangements taking into account repeats, tandem repeats and duplications, starting with the simple collection of ungapped local alignments. GECKO-CSB formalizes linearity and collinearity properties in a computational synteny block (CSB) framework. These properties are useful not only to detect CSBs as it is shown in the results section but also to detect and identify Evolutionary Events. GECKO-CSB is the first method to approach the whole process as a coherent workflow -thus outperforming current state-of-the-art software tools- and additionally allowing to classify the type of rearrangement. GECKO-CBS is a part of GECKO software suite.

ERDS / Estimation by Read Depth with Single-nucleotide variants

An open-source software tool free to academia and nor-profit organization, designed for inferring copy number variants (CNVs) in high-coverage human genomes using next generation sequence (NGS) data. When a CNV presents in a test genome, multiple signatures, weak or strong, would present in the alignment data. ERDS starts from read depth (RD) information, and integrates other information including paired end mapping (PEM) and soft-clip signature to call CNVS sensitively and accurately.

SV-AUTOPILOT / Structural Variation AUTOmated PIpeLine Optimization Tool

Standardizes the Structural Variation (SV) detection pipeline. SV-AUTOPILOT is a pipeline that can be used on existing computing infrastructure in the form of a Virtual Machine (VM) Image. It provides a “meta-tool” platform for using multiple SV-tools, to standardize benchmarking of tools, and to provide an easy, out-of-the-box SV detection program. In addition, the user can choose which of several alignment algorithms is used in their analysis.