Tumor purity and clonality estimation software tools | Whole-genome sequencing data analysis
Solid tumor samples typically contain multiple distinct clonal populations of cancer cells, and also stromal and immune cell contamination. A majority of the cancer genomics and transcriptomics studies do not explicitly consider genetic heterogeneity and impurity, and draw inferences based on mixed populations of cells. Deconvolution of genomic data from heterogeneous samples provides a powerful tool to address this limitation.
A tool for inferring the cellular frequency of point mutations from deeply sequenced data. The model supports simultaneous analysis of multiple related samples and infers clusters of mutations whose cellular prevalences shift together. Such clusters of mutations can be inferred as mutational genotypes of distinct clonal populations. The input data for PyClone consists of a set read counts from a deep sequencing experiment, the copy number of the genomic region containing the mutation and an estimate of tumour content.
Renders complex relationships between cancer evolution data in an intuitive, interactive framework for biomedical investigators. E-scape is composed of three visualization tools: TimeScape, MapScape, CellScape. It permits to study the dynamics of disease progression by combining all components needed. The can be useful for researchers engaged effectively with cancer evolution data sets. It will participate to the understanding of clonal evolution in cancer towards translation into the clinical domain.
Models tumor in normal (TiN) based on tumor and matched normal sequencing data. DeTiN proceeds by estimating a normal sample as a mixture of normal cells with an unknown fraction of contaminating tumor cells. The estimations are built on both candidate somatic single nucleotide variants (SSNVs) and allele-specific somatic copy number alterations (aSCNAs). This software can also recover somatic variants by using both SSNVs and indels.
Estimates tumor purity and subclonality using next-generation sequencing (NGS) data. PurBayes is a Bayesian mixture modeling approach that uses the MCMC software JAGS. The software was tested using simulation studies. For the homogenous tumor simulations, PurBayes correctly identified tumor homogeneity in all replications. It can also facilitate inference regarding the tumor composition and evolution as well as isolation of potential founder events.
Estimates the fraction of tumor DNA molecules that is different from the normal matched tissue. PurityEst is a method that gives a purity estimate from somatic mutations in each chromosome and takes an average of the chromosome-wide estimates to be the purity estimate of the tumor tissue. The software can be used for determining tumor purity based on mutant allele fractions in a mixture of a tumor clone and a normal clone.
Quantifies the percentage of reads supporting a considered aberration from clinical tumors. CLONET uses the abundant germline heterozygous SNP genotype data provided by whole genome sequence coverage by exploiting individuals’ genetic background. It allows to compare tumors types of the same aberration class and different aberrations within the same tumor type. The tool is based on a local optimization where estimates of purity and ploidy are derived from few clonal events.
Provides quantitative variant callers for detecting subclonal mutations in ultra-deep sequencing experiments. DeepSNV is a comparative targeted deep-sequencing approach combined with a customised statistical algorithm, which can detect and quantify subclonal single-nucleotide variants (SNVs) in mixed populations. The deepSNV algorithm is used for a comparative setup with a control experiment of the same loci and the shearwater algorithm computes a Bayes classifier based on a beta-binomial model for variant calling with multiple samples for precisely estimating model parameters.
An algorithm that estimates the tumor purity and clonal/subclonal copy number aberrations directly from high-throughput DNA sequencing data. THetA successfully estimates normal admixture and recovers clonal and subclonal copy number aberrations in real and simulated sequencing data.
A method for phylogenetic reconstruction and heterogeneity quantification based on a minimum event distance for intra-tumour copy-number comparisons. Given multiple such evolutionarily-related copy-number profiles, for example from distinct primary and metastatic sites of the same patient, phylogenetic inference in MEDICC then involves three steps: (i) allele-specific assignment of major and minor copy-numbers, (ii) estimation of evolutionary distances between samples followed by tree inference and (iii) reconstruction of ancestral genomes. The MEDICC algorithms are independent of the experimental techniques used and are applicable to both next-generation sequencing and array CGH data.
A tool for identification of copy number changes from diverse sequencing experiments including whole-genome matched tumor-normal and single-sample normal re-sequencing, as well as whole-exome matched and unmatched tumor-normal studies. In addition to variant calling, Canvas infers genome-wide parameters such as cancer ploidy, purity and heterogeneity. It provides fast and simple to execute workflows that can scale to thousands of samples and can be easily incorporated into existing variant calling pipelines.
A method that can be applied to WGS data from one or more tumor samples to reconstruct complete genotypes of these subpopulations based on variant allele frequencies (VAFs) of point mutations and population frequencies of structural variations. Unlike all previous methods, PhyloWGS appropriately corrects SSM population frequencies in regions overlapping CNVs and is fast enough to perform reconstruction of at least five cancerous subpopulations based on thousands of mutations.
Provides a method dedicated to both prediction of copy number alterations (CNAs) and assessment of tumor fractions within ultralow-pass whole-genome sequencing (ULP-WGS) data. ichorCNA is a standalone software which do not require prior knowledge of somatic single nucleotide variants (SSNVs) or SCNAs in the investigated tumors. It can be used for determining cell-free DNA (cfDNA) samples with sufficient tumor content for whole-exome sequencing (WES) as well as for highlighting SCNAs signaling tumor biopsies.
Assists in predicting subclonal copy number alterations (CNA) and loss of heterozygosity (LOH) from tumour whole genome sequencing (WGS) data. TITAN infers the clonal cluster of events along with their estimates of cellular prevalence, which is proportion of tumour cells harbouring an event. It also estimates the normal contamination and tumour ploidy, and can perform on whole exome sequencing (WES) data.
Infers regions of loss of heterozygosity (LOH) from paired tumor-normal data. APOLLOH is a nonstationary hidden Markov model (HMM) that predicts regions of LOH in genome sequencing data of cancers. The software can complement the arsenal of computational tools designed for cancer focused sequencing studies. It was applied to 23 triple-negative breast cancer genomes sequenced to about 30x coverage on two massively parallel sequencing platforms.
A technique to help reconstruct the history of rearrangements responsible for cancer genome karyotypes. This uses allelic copy number segmentation, rearrangements, and somatic single-nucleotide mutation distributions, and so is based entirely on the final observed portfolio of mutations. The simplest application of this method is to construct digital karyotypes with path-walking techniques that have classically required chromosomal painting.
Automates the phylogenetic inference of cancer progression from multiple somatic samples. LICHeE uses variant allele frequencies of somatic single nucleotide variants obtained by deep sequencing to reconstruct multi-sample cell lineage trees and infer the subclonal composition of the samples.
Examines somatic variation events (such as copy number changes, loss of heterozygosity, or point mutations) in order to identify the underlying subclone structure, i.e. the subclones including the normal (non-cancerous) cells and their cellular frequencies within the tumor tissue. In contrast to other methods that require SNV allele frequencies, Subcloneseeker is able to analyze many different types of genomic variant data, as long as allele frequency measurements can be converted into cell prevalence values.
A probabilistic framework to reconstruct intra-tumor evolutionary pathways. The statistical model is based on simultaneously assigning markers of evolution to clones, which are represented as both inner nodes and leaves of a phylogenetic tree, and on learning the topology and the parameters of the tree. We use a tree-structured stick-breaking process (TSSB) to construct a prior probability of trees and a Markov chain Monte Carlo (MCMC) inference scheme for sampling from the joint posterior. The relationships between parent and child nodes are derived from a classical phylogeny model.
A computational method for reconstructing the sequence of copy number changes driving carcinogenesis, based on the analysis of several tumor samples from the same patient. TuMult is a valuable tool for the establishment of clonal relationships between tumor samples and the identification of chromosome aberrations occurring at crucial steps in cancer progression.
A somatic point mutation caller for tumor-normal paired samples in next-generation sequencing (NGS) data. MuSE models the evolution of the reference allele to the allelic composition of the matched tumor and normal tissue at each genomic locus. To improve overall accuracy, we further adopt a sample-specific error model to identify cutoffs, reflecting the variation in tumor heterogeneity among samples.
Topics (10): WGS analysis, Homo sapiens, Central Nervous System Neoplasms, Nervous System Neoplasms, Brain Diseases, Neoplasms, Nerve Tissue, Neoplasms, Germ Cell and Embryonal, Neoplasms, Glandular and Epithelial, Neoplasms, Genomic Instability