1 - 50 of 71 results


Identifies driver mutations, rather than driver genes. Hotspots is a computational algorithm developed from a curated repository of cancer genome data consisting of the sequenced tumor exomes and whole genomes of more of 11,000 human tumors representing 41 tumor types. It offers population-scale recurrent mutations in cancer based on a binomial statistical model that incorporates underlying mutational processes including nucleotide context mutability, gene-specific mutation rates, and major expected patterns of hotspot mutation emergence

MutSig / Mutation Significance

Analyzes lists of mutations discovered in DNA sequencing, to identify genes that were mutated more often than expected by chance given background mutation processes. MutSig was originally developed for analyzing somatic mutations, but it has also been useful in analyzing germline mutations. MutSig builds a model of the background mutation processes that were at work during formation of the tumors, and it analyzes the mutations of each gene to identify genes that were mutated more often than expected by chance, given the background model.

InVEx / Introns Vs Exons

A permutation-based method for ascertaining genes with a somatic mutation distribution showing evidence of positive selection for non-silent mutations. The method was developed for use in cancer genomics studies, with particular relevance to high mutation rate cancers. Mutations are permuted on a per-patient, per-trinucleotide-context basis across the covered exon, intron and UTR base pairs of a gene, generating a null model of the distribution of mutations to which the observed distribution can be compared to determine statistical significance. Significant genes are of interest, as their somatic mutation is likely to be important in the formation of the cancer being studied. The method can operate on whole exome as well as whole genome sequencing data.


A computational method for identifying 'active' sites in proteins (signalling sites, protein domains, regulatory motifs) that are specifically and significantly mutated in cancer genomes. ActiveDriver provides signalling-related interpretation of single nucleotide variants (SNVs) identified in cancer genome sequencing. ActiveDriver is based on a gene-centric logistic regression model that considers multiple factors in estimating significance of mutation enrichment (or depletion) in active sites. The factors include mutation frequency, distribution of active sites in protein sequence, their position with respect to mutations (direct and flanking), and structured and disordered regions of proteins.


Serves for the functional analysis of gene expression and genomic data. Babelomics offers the possibility to explore the effects of alteration in gene expression levels or changes in genes sequences within a functional context. It provides user-friendly access to a full range of methods that cover: (1) primary data analysis; (2) a variety of tests for different experimental designs; and (3) different enrichment and network analysis algorithms for the interpretation of the results of such tests in the proper functional context.


Allows users to determine if particular changes are likely to be cancer-associated. The impact of each change is measured using two known methods: Sorting Intolerant From Tolerant (SIFT) and the Pfam-based LogR.E-value metric. A third method, the Gene Ontology Similarity Score (GOSS), provides an indication of how closely the gene in which the variant resides resembles other known cancer-causing genes. Scores from these three algorithms are analyzed by a random forest classifier which then predicts whether a change is likely to be cancer-associated.


An approach to uncover driver genes or gene modules. It computes a metric of functional impact using three well-known methods (SIFT, PolyPhen2 and MutationAssessor) and assesses how the functional impact of variants found in a gene across several tumor samples deviates from a null distribution. It is thus based on the assumption that any bias towards the accumulation of variants with high functional impact is an indication of positive selection and can thus be used to detect candidate driver genes or gene modules.

CHASM/SNV-Box / Cancer-specific High-throughput Annotation of Somatic Mutations

A software toolkit to prioritize SNVs based on their predicted contribution to tumorigenesis. CHASM includes a database of pre-computed predictive features called SNVBox that facilitates rapid feature retrieval and classification of very large SNV datasets. Furthermore, the features in SNVBox can be generally used to aid in the development of new classification algorithms that predict the impact of either germline or somatic SNVs.

MuSiC / Mutational Significance In Cancer

A set of tools aimed at determining the significance of somatic mutations discovered within a given cohort of cancer samples, incorporating the cohort's alignment data, variant lists and any relevant clinical data. The development of MuSiC was motivated by the rapidly expanding numbers of mutation data sets from a wide variety of tumor types. It is imperative during post-discovery analysis to separate the significant, or “driver,” mutations from the passenger mutations to more accurately pinpoint the key genes and pathways critical for disease initiation and progression. MuSiC is designed precisely to streamline this process into an easily accessible high-throughput software exercise.


Analyzes genome and transcriptome data for identifying and prioritizing sequence altered genes as potential cancer drivers. HIT’nDRIVE is a combinatorial method that integrates patient-specific genomic alterations with the associated transcriptome profile, identifying driver genes that dysregulate large portion of each patient’s transcriptome. It aims to identify the most parsimonious set of patient-specific driver genes that have sufficient “influence” over a large proportion of the expression-outliers.

CRAVAT / Cancer-Related Analysis of Variants Toolkit

Performs cancer-related analysis of variants. CRAVAT returns mutation interpretations in a dynamic interactive web environment for sorting, visualizing and inferring mechanism. The software (i) performs all projecting and assigns sequence ontology, (ii) predicts mutation impact using multiple bioinformatics classifiers normalized, (iii) allows for joint prioritization of all non-silent mutation types, organizes annotation from many sources on graphical displays of protein sequence and 3D structure, and (iv) facilitates dynamic filtering. It is suitable for both large and small studies and developed for easy integration with other software.


Infers the global impact of tumorigenic genetic and epigenetic alterations in the tissue-specific network and identifies regulatory cancer drivers. RegNetDriver can be used to analyze other cancer types, and users can expect single nucleotide variants (SNVs), structural variants (SVs), and methylation changes can play roles of varied importance in different tumor types and tissues. It can be useful for analyzing around 2,800 tumor whole genomes, transcriptomes, and epigenomes of around 40 tumor types from the upcoming Pan-Cancer Analysis of Whole Genomes project.


Assigns molecular functional effects of non-synonymous SNPs based on structure and sequence analysis. There are three unique features of the SNPs3D resource. First, it is designed specifically for the analysis of the relationship between SNPs and disease. Second, it constructs gene networks based on conceptual relationships derived from the literature, rather than experimental data. Third, it integrates access to all available and relevant information sources, wherever possible giving the user easy access to the underlying data and literature, so that informed judgments can be made.


Identifies proteins with significant clusters. SpacePAC provides “localization” for mutational hotspots. It uses a three-step process to identify mutational clusters: (i) obtain the mutational and structural data, (ii) reconcile the databases so that the mutational information can be mapped onto the protein structure, and (iii) simulate the distribution of mutation locations over the protein tertiary structure and identify if any regions of the protein have observed mutational counts in the tail of the distribution.

QuartPAC / Quaternary Protein Amino acid Clustering

Provides a unique tool to identify mutational clustering while accounting for the complete folded protein quaternary structure. QuartPAC identifies non-random mutational clustering while utilizing the protein quaternary structure in 3D space. By integrating the spatial information in the Protein Data Bank (PDB) and the mutational data in the Catalogue of Somatic Mutations in Cancer (COSMIC), QuartPAC is able to identify clusters which are otherwise missed in a variety of proteins.

CLUMPS / CLUstering of Mutations in Protein Structures

Assesses the significance of mutational clustering in a given 3D structure. CLUMPS is a statistical method that does not attempt to specify individual clusters but rather detects an overall enrichment of mutated residues that are spatially close to each other. The method uses a weighted average proximity (WAP) scoring function summarizing the pairwise Euclidean distances of all mutated residues in the structure, weighted by the normalized number of samples in which they are mutated.

transFIC / transformed Functional Impact score for Cancer

A method to transform functional impact scores taking into account the differences in basal tolerance to germline SNVs of genes that belong to different functional classes. This transformation allows to use the scores provided by well-known tools (e.g. SIFT, Polyphen2, MutationAssessor) to rank the functional impact of cancer somatic mutations. Mutations with greater transFIC are more likely to be cancer drivers. TransFIC takes as input the Functional Impact Score of a somatic mutation observed in cancer provided by one of the aforementioned tools. It then compares that score to the distribution of scores of germline SNVs observed in genes with similar functional annotations (for instance genes with the same molecular function as provided by the Gene Ontologies). The score is thus transformed using the Zscore formula. The result is that mutations in genes that are less tolerant to germline SNVs are amplified, while the scores of mutations on relatively tolerant genes are decreased.

DOTS-Finder / Driver Oncogene and Tumor Suppressor Finder

Identifies driver genes and classifies them as tumor suppressor genes (TSGs) and/or oncogenes (OGs). DOTS-Finder integrates a novel pattern-based method with a protein function approach (functional step) and a frequentist method (frequentist step). It was developed to facilitate the identification of candidate targets and be used to develop diagnostic, prognostic or therapeutic strategies, even in situations where the available data are scarce. This application can also be used to identify driver genes with atypical patterns of mutations.

SPARROW / SPARse selected expRessiOn regulators identified With penalized regression

A method to identify genes driving expression changes in cancer genome evolution from genome-wide expression data. The SPARROW method uses a sparse regression methodology, variational Bayes spike regression (VBSR), to infer the relative importance of a given candidate expression driver. This is done by fitting a sparse regression model for every gene in an expression data-set with a set of candidate expression drivers as potential drivers. Candidate drivers which are frequently chosen in the sparse bases across genes are prioritized as more likely to be true gene expression drivers.


A computational pipeline for mapping large-scale cancer exome data across patients onto protein structures, and automatically extracting proteins with an enriched number of mutations affecting their nucleic acid, small molecule, ion or peptide binding sites. CanBind represents a complementary approach to existing methods, as it directly uses structural information in the context of large-scale cancer resequencing data, and is a step toward providing mechanistic interpretations of the effects of mutations. One important aspect of this approach is that it can highlight genes that may be infrequently mutated overall, but for which mutations preferentially occur in binding sites.

iSIMPRe / identification of SIgnificantly Mutated Protein Regions

Helps to interpret the effect of cancer mutations at the level of functional regions. iSIMPRe is able to pinpoint proteins and protein regions that harbor a significant amount of cancer-related mutations in an unbiased manner. It takes the list of the observed non-synonymous mutations and automatically identifies not only potential cancer driver genes, but also specific regions that are involved in the disease development.

FABRIC / Functional Alteration Bias Recovery In Coding-regions

Identifies genes with alteration bias. FABRIC is an open source application that is able to highlight genes with alteration bias in various contexts. The software attributes effect scores to mutations in coding regions by leaning on a machine-learning prediction model. It can be used in cancer genomics studies to detect alteration-promoting genes or in population genetic variation studies to identify alteration-rejecting genes.

TCI / Tumor-specific Causal Inference

Aims to find the somatic genome alterations (SGAs) that causally regulate cancer-related molecular phenotypes. TCI is a Bayesian causal inference framework, implementing an algorithm that identifies driver SGAs in a tumor-specific and signal-oriented fashion. The software infers causal relationships between SGAs and differentially expresses genes (DEGs) within a specific tumor. It unifies the frequency-oriented and signal-oriented approaches to determine the functional impact of an SGA event within a specific tumor.

CaDrA / Candidate Driver Analysis

Searches for the set of genomic alterations associated with a user-provided ranking of samples within a dataset. CaDrA is based on a stepwise heuristic search to recognize a subset of features whose union is maximally-associated with the observed sample ranking. It can carry out rigorous statistical significance testing based on sample permutation. This tool enables users to select sets of genomic features that drive certain oncogenic phenotypes in cancer.


Provides users a framework to easily run a wide range of cancer driver prediction methods on omics datasets and integrates results to obtain consensus predictions that have higher sensitivity and precision. ConsensusDriver uses docker technology to significantly reduce the effort in installing and using different software packages, and it enables analysis on a personal computer for those who are not adept at using servers and linux systems. It combines diverse driver prediction paradigms including popular methods such as MutsigCV, OncodriveFM, DriverNET, OncoIMPACT, fathmm and CHASM.


A method for uncovering the dominant effects of cancer-driver genes based on a partial covariance selection approach. Inspired by a convex optimization technique, DEOD estimates the dominant effects of candidate cancer-driver genes on the expression level changes of their target genes. It constructs a gene network as a directed-weighted graph by integrating DNA copy numbers, single nucleotide mutations, and gene expressions from matched tumor samples, and estimates partial covariances between driver genes and their target genes. Then, a scoring function to measure the cancer-driver score for each gene is applied.

MADGiC / Model-based Approach for identifying Driver Genes in Cancer

Identifying and prioritizing somatic mutations is an important and challenging area of cancer research that can provide new insights into gene function as well as new targets for drug development. MADGiC incorporates both frequency and functional impact criteria and accommodates a number of factors to improve the background model. Simulation studies demonstrate advantages of the approach, including a substantial increase in power over competing methods.


Identifies cancer driver genes based on linear annotations of biological regions such as protein domains. e-Driver uses information on three-dimensional (3D) structures of the mutated proteins to identify specific structural features. Then, the algorithm analyzes whether these features are enriched in cancer somatic mutations and, therefore, are candidate driver genes. It was used to identify protein-protein interaction (PPI) interfaces enriched in somatic cancer mutations in a total of 103 genes (interface driver genes).

PRISMAD / Polymorphic Rates Indicate Somatic Mutations As Drives

The catalogue of tumour-specific somatic mutations (SMs) is growing rapidly owing to the advent of next-generation sequencing. Identifying those mutations responsible for the development and progression of the disease, so-called driver mutations, will increase our understanding of carcinogenesis and provide candidates for targeted therapeutics. PRISMAD is a tool for annotating genes as candidates for harbouring somatic driver mutations.

GraphPAC / Graph Protein Amino acid Clustering

Identifies mutational clusters of amino acids in a protein while utilizing the proteins tertiary structure via a graph theoretical model. Using GraphPAC, we are able to detect novel clusters in proteins that are known to exhibit mutation clustering as well as identify clusters in proteins without evidence of prior clustering based on current methods. Specifically, by utilizing the spatial information available in the Protein Data Bank (PDB) along with the mutational data in the Catalogue of Somatic Mutations in Cancer (COSMIC), GraphPAC identifies new mutational clusters in well known oncogenes such as EGFR and KRAS. Further, by utilizing graph theory to account for the tertiary structure, GraphPAC discovers clusters in DPP4, NRP1 and other proteins not identified by existing methods. GraphPAC provides an alternative to iPAC and an extension to current methodology when identifying potential activating driver mutations by utilizing a graph theoretic approach when considering protein tertiary structure.


Analyzes DNA-Whole Exome Sequencing (DNA-WES) and patient-matched RNA-seq to detect somatic mutations genome-wide. UNCeqR is an algorithm that detects somatic mutations within exons based on input of tumor and patient-matched germline sequence alignments. This algorithm applied the following steps to each genomic site within exons: (i) filter for high quality data, (ii) identify germline alleles from germline reads, (iii) use tumor sequences, (iv) if major variant allele is insertion or deletion, re-align nearby indel alleles, and (v) if high quality variant filter is passed, apply statistical test.