Computational protocol: Novel putative drivers revealed by targeted exome sequencing of advanced solid tumors

Similar protocols

Protocol publication

[…] All analyses were performed independent of any company. Variant lists obtained from the sequencing vendor () were redacted and sent to the Louisiana State University Health Sciences Center and University of Alabama at Birmingham collaborative team for analysis, retaining patient confidentiality. Variants were manually re-validated individually against HG-19 and then evaluated for a potential resulting cancer driver phenotype. Missense mutations were scored with two algorithms, CHASM and FATHMM, which are considered reliable predictors [, ]. Cancer-specific High-throughput Annotation of Somatic Mutations (CHASM) is a computational method based on a Random Forest classifier trained to discriminate between driver missense mutations, culled mainly from the curated COSMIC database, and in silico generated passenger missense mutations in genes found mutated at least once in large scale sequencing studies of different tumors []. The scores from the Random Forest classifier are used to generate Benjamini-Hochberg corrected p values, with the null hypothesis positing that the mutation being tested is not functionally related to tumor growth (passenger). The CHASM engine can be accessed at the CRAVAT web site (http://www.cravat.us).The Functional Analysis through Hidden Markov Models (FATHMM) algorithm was first developed for the prediction of the functional effects of protein missense variants in inherited diseases []. This method relies on the fact that hidden Markov models (HMMs) can be used to capture position-specific information within a multiple sequence alignment of homologous sequences. Starting with a manually curated HMMs representing the alignment of conserved protein domain families, a weighted/species-specific method incorporating “pathogenicity weights” was devised. These weights were derived from the relative frequencies of disease-associated and functionally neutral amino acid substitutions mapping onto conserved protein domains. In a later adaptation, a cancer-specific weighting scheme was incorporated to potentiate the functional analysis of driver mutations, and the algorithm was also extended to evaluate mutations falling outside conserved protein domains [, ]. A web-based implementation of the cancer-specific model is available at http://fathmm.biocompute.org.uk.High confidence potential drivers were defined as both having a ≤ -1.5 score in FAHTMM (corresponding to a specificity of 0.94 and a sensitivity of 0.80) and a p-value ≤ 0.03 (corresponding to a false discovery rate (FDR) ≤ 0.15) in CHASM. Low confidence potential drivers met only one of these requirements. Indels in tumor suppressor genes resulting in early termination of translation or a deleterious frameshift were considered high confidence potential cancer drivers. Mutations in acceptor and donor splice sites in tumor suppressor genes were considered high confidence drivers if 1) the substitution involved invariant nucleotides in splice consensus sequences and 2) the splice site was used for the generation of all the mature mRNAs from the particular gene. Considering that the driver potential of a particular variant is context-depended, the actual driver likelihood of mutated tumor suppressors is based on known cases of haploinsufficiency or on a “worst case scenario”, where in at least a fraction of the tumor cells events such as loss of heterozygosity or silencing of the wild type allele take place [].Lastly, published reports regarding phenotypic consequences of particular mutations, included in , were then used to inform the assessment and/or validate the scoring criteria. Mutations reported in the COSMIC database (http://cancer.sanger.ac.uk/cosmic), as of 12/2017 are indicated with asterisks in Figs –.We analyzed biospecimens from 44 deidentified, advanced solid tumors by targeted exome sequencing. A complete list of the variants identified in our study is presented in . In our dataset, counts of potential driver mutations per patient (including known candidate drivers present in COSMIC and high-confidence putative drivers identified by both FATHMM and CHASM algorithms) form a distribution with a range of 0 to 9, a median of 3, and an average of 3.8. Evidence from well-studied cases indicates that a small number of drivers are sufficient to confer a neoplastic phenotype. For instance, only three driver mutations are likely required for the development of lung and colorectal cancers []. Our results suggest that the number of drivers in advanced tumors resistant to standard of care treatments may not necessarily be higher than in primary tumors. However, the number of novel high-confidence putative drivers identified in our samples supports the hypothesis that treatment selects resistant clones carrying new driver mutations [–]. Out of a total of 159 putative driver mutations, 41 (26%) were previously described in COSMIC and scored as high-confidence candidates by our analysis. Twenty (13%) were previously described in COSMIC and scored as low-confidence candidates. Forty-four mutations (28%) were previously undescribed and scored as high-confidence candidate drivers, while 58 mutations (36%) were previously undescribed and scored as low-confidence candidate drivers (See ).The cancer subtypes represented in our study are discussed individually below. […]

Pipeline specifications

Software tools CHASM, FATHMM, CRAVAT
Application WES analysis
Organisms Homo sapiens
Diseases Neoplasms, Sarcoma, Colorectal Neoplasms