Cancer evolution software tools | Phylogenomics data analysis
Cancer can be a result of accumulation of different types of genetic mutations such as copy number aberrations. The data from tumors are cross-sectional and do not contain the temporal order of the genetic events. Finding the order in which the genetic events have occurred and progression pathways are of vital importance in understanding the disease.
Simulates a model for tumour evolution that shows how short-range dispersal and cell turnover can account for rapid cell mixing inside the tumour. TumourSimulator model shows that even a small selective advantage of a single cell within a large tumour allows the descendants of that cell to replace the precursor mass in a clinically relevant time frame. It has also been demonstrated that the same mechanisms can be responsible for the rapid onset of resistance to chemotherapy. This model not only provides insights into spatial and temporal aspects of tumour growth, but also suggests that targeting short-range cellular migratory activity could have marked effects on tumour growth rates. The source code of this model can be downloaded for local use. An interactive version for MS Windows is available and allows the user to set the parameters of the simulation and set up video recording.
A specific probabilistic graphical model for the accumulation of mutations and their interdependencies. The Bayesian network models cancer progression by an explicit unobservable accumulation process in time that is separated from the observable but error-prone detection of mutations. Model parameters are estimated by an expectation-maximization algorithm and the underlying interaction graph is obtained by a simulated annealing procedure.
Allows users to infer evolutionary pathways. HyperTraPS consists of an algorithm that serves for sampling rare evolutionary paths on a hypercubic transition network. This tool can be used for calculating the probabilities of different patterns of mitochondrial (mtDNA) gene loss, given observed changes in mtDNA across the tree of life and uninformative uniform priors on transition probabilities. It can be utilized in different fields such as: cancer progression, antibiotic resistance in tuberculosis, chloroplast evolution, and paleontology.
Infers cancer progression models from heterogeneous genomic data. TRONCO is an R package built to extract population-level models describing the trends of accumulation of alterations in a cohort of cross-sectional samples and individual-level models that reveal the clonal evolutionary history in single cancer patients, when multiple samples are available. It also implements an oncoprint system to visualize the processed data.
Allows the identification of mutational signatures within a single tumor sample. The deconstructSigs approach determines the linear combination of pre-defined signatures that most accurately reconstructs the mutational profile of a single tumor sample. It uses a multiple linear regression model with the caveat that any coefficient must be greater than 0, as negative contributions make no biological sense. Application of deconstructSigs identifies samples with DNA repair deficiencies and reveals distinct and dynamic mutational processes molding the cancer genome in esophageal adenocarcinoma compared to squamous cell carcinomas. deconstructSigs confers the ability to define mutational processes driven by environmental exposures, DNA repair abnormalities, and mutagenic processes in individual tumors with implications for precision cancer medicine.
Checks the infinite sites assumption (ISA) in single-cell sequencing data. This method is based on real tumor data. It employs power of single-cell sequencing to learn high resolution pictures of tumor evolution and accounts for the noise in such data. This tool was validated with simulation studies and then examines a variety of single-cell sequencing data sets, uncovering widespread violations of the ISA in human cancers.
A software package to estimate mixture models of mutagenetic trees from observed cross-sectional data. Mutagenetic tree mixtures are probabilistic models that have been designed to describe evolutionary processes that are characterized by the accumulation of genetic changes. Mtreemix has been applied to model the development of drug resistance-associated mutations in the HIV genome and the accumulation of chromosomal gains and losses in tumor development.
A generative probabilistic model for detecting patterns of various degrees of mutual exclusivity across genetic alterations, which can indicate pathways involved in cancer progression. TiMEx explicitly accounts for the temporal interplay between the waiting times to alterations and the observation time. In simulation studies, we show that our model outperforms previous methods for detecting mutual exclusivity. On large-scale biological datasets, TiMEx identifies gene groups with strong functional biological relevance, while also proposing new candidates for biological validation. TiMEx possesses several advantages over previous methods, including a novel generative probabilistic model of tumorigenesis, direct estimation of the probability of mutual exclusivity interaction, computational efficiency and high sensitivity in detecting gene groups involving low-frequency alterations.
Reconstructs evolutionary paths and ancestral genotypes from sequenced tumor samples. BML first estimates the probability P(g) that a particular combination of mutations (denoted by genotype g) reaches fixation in a cell population that has evolved from a normal cell genotype and will eventually attain a tumor cell genotype. BML uses both observed tumor samples and imputed evolutionary paths to estimate P(g). The evolutionary probabilities are represented by a Bayesian network (up to an overall normalizing factor) that is optimized for the best choice of imputed paths. Once a Bayesian network is selected, a recursive algorithm is used to infer the likely Evolutionary Progression Paths (EPP). This software package is freely available for download.
Allows estimation of cancer progression. bcbn is a Bayesian inference scheme for conjunctive Bayesian networks (CBNs). With this tool, a Markov chain Monte Carlo (MCMC) sampling algorithm has been developed that uses a set of moves specifically designed to overcome local optima of dependency structures.
Offers an environment for estimating the mutagenetic trees mixture models from cross-sectional data and using them for various predictions. Rtreemix includes functions for fitting the trees mixture models, likelihood computations, model comparisons, waiting time estimations, stability analysis, etc. It takes advantage of the high-level interface, the statistical tools and the large amount of data that R and Bioconductor projects provide. For estimating mixture models, the package builds up on efficient C/C++ code provided by a modified version of the Mtreemix software, which we made independent of the LEDA package in order to provide a free R package. Rtreemix implements the main functionality of Mtreemix for model fitting and adds new functions for estimating genetic progression scores with corresponding confidence intervals and for performing model analysis.
Investigates tumor evolution. MIPUP is a multi-sample method intending to reconstruct ancestral relation between clones and samples. This program proposes an integer linear programming (ILP) formulation that uses a relation between perfect phylogenies and branchings in a directed acyclic graph. It also includes features allowing users to report the totality of the optimal solutions or a defined number of these.
Allows analysis of the accumulation of mutations. MC-CBN is a program useful to model HIV evolution or the waiting time process of the accumulation of mutations under temporal ordering constraints. It includes an algorithm that permits treatment of a large number of mutations.
Simulates homogeneous and inhomogeneous stochastic branching processes under a very flexible set of assumptions. SIApopr simulates clonal evolution with the emergence of driver and passenger mutations under the infinite-allele assumption. It adapts the Stochastic Simulation Algorithm (SSA) to simulate an infinite-allele branching process where mutant cells are of a unique type each and have random variables representing their birth and death rates.
Enables copy number profiling and downstream analyses in disease genetic studies. MARATHON is a pipeline that gathers statistical software: CODEX and CODEX2 perform read depth normalization for total copy number profiling, iCNV receives read depth normalized by CODEX/CODEX2, FALCON and FALCON-X perform allele-specific copy number (ASCN) analysis and Canopy receives input from FALCON/FALCON-X to perform tumor phylogeny reconstruction. The pipeline adapts to different study designs and research goals.
An approach for modelling the dependences between genetic changes in human tumours. oncomodel computes probabilistic tree models for oncogenesis based on genetic data using maximum likelihood. In this approach, genetic tumour development and progression is assumed to follow a probabilistic tree model.
Reconstructs tumor subclonal phylogenies using somatic mutation cellularities in patient's tumor sample(s). SCHISM combines information about somatic mutation cellularity (aka mutation cancer cell fraction) across all tumor sample(s) available from a patient in a hypothesis testing framework to identify the statistical support for the lineage relationship between each pair of mutations or mutation clusters. The results of the hypothesis test are represented as Cluster Order Precedence Violation (CPOV) matrix which informs the subsequent step in SCHISM and ensures compliance of candidate tree topologies with lineage precedence rule. Next, an implementation of genetic algorithm (GA) explores the space of tree topologies and returns a prioritized list of candidate subclonal phylogenetic trees, most compatible with observed cellularity data.
A clustering method for cancer sub-clonal evolutionary trees, in which sub-groups of the trees are identified based on topology and edge length attributes. phyC can detect true clusters with sufficient accuracy. Application of the method to actual multi-regional sequencing data of clear cell renal carcinoma and non-small cell lung cancer allowed for the detection of clusters related to cancer type or phenotype.
Deduces from cross-sectional data of genetic alterations in tumor patients the causal dependencies and the waiting times among these genetic events. From matrices with genetic events and patient samples as rows and columns, respectively, TO-DAG generates a probabilistic graph model whose nodes represent genetic events and oriented edges between nodes indicate the presence and the direction of a causal dependency between the nodes. A direct acyclic graph, i.e. a graph with no directed cycles, has been specifically chosen as model of putative causal dependencies, as genetic alterations are assumed to be irreversible events. Two parameters define an edge: (i) its probability estimated from the frequency of occurrence of the genetic events represented by the nodes and its conditional probability, and (ii) the waiting time, i.e. the time elapsing from the occurrence of a mutation to the occurrence of another one that is conditionally dependent on it.
Displays changes in clonal structure over time. fishplot is able to produce illustrative and easy-to-interpret graphs suitable for diagnosis, presentation, and publication. It permits to show how an individual tumor is changing over time in order to make analysis and diagnosis easier. The tool can be applied to other domain and can represent the changing landscapes of microbial populations. It can be useful for study tumor evolution.