Computational protocol: Distinguishing between driver and passenger mutations in individual cancer genomes by network enrichment analysis

Similar protocols

Protocol publication

[…] Sequence-based methods tend to be specialized for the analysis of specific mutation types such as copy number changes, gene fusions, or short insertions/deletions/substitutions. They evaluate concrete alterations of the nucleotide sequence in each gene and assess their potential impact on the protein’s function. If the predicted impact is strong, the mutation is deemed damaging.In order to compare NEA with sequence-based methods, we uniformly submitted the point mutations from the GBM and OV sets to three web services that can estimate the impact of point mutations on peptide chain functionality: Mutation Assessor [], PolyPhen-2 [, ], and SIFT []. Of course, the classifications obtained using these services are not identical to those used when defining driver mutations in cancer biology, but they are what cancer scientists look for when using these resources. The methods’ efficiency was probably limited by a lack of information on homologous sequences and protein structures in their databases, although the extent to which they rely on such information varies.The overall positive rate of the sequence-based methods was very high. For example, around 40% of 4896 mutations in the OV genomes were predicted to be damaging by at least one of the three sequence-based tools (while as few as 3% were identified as potential drivers by 1point-vs-MGS NEA at FDR <0.1). On the other hand, one would expect to see functionally significant peptide chain alterations in most frequently mutated genes. We did note that 70-80% of EGFR (in OV) and TP53 (in both OV and GBM) mutations were predicted to be either high impact or damaging. However, this was not the case for most of the other genes that were mutated in more than five (and up to 70) genomes each (NF1, IDH1 etc.). Between 35 and 40% of mutations in these genes were classified as having “damaging” or “high” impact, which is no greater than the rate for genes in which mutations occurred only once (Additional file : Figure S7, discussed in the next section). Furthermore, these high positive rates did not decrease in the OV set, where mutations in ~13000 genes were analyzed. For example, the group of 3899 genes mutated in a single OV sample each also had a positive rate close to 40%. This was a counterintuitive result, especially when compared to the GBM set with its ~500 mutated genes pre-selected by known implication in cancer. Considering the sequence analysis results in the gold standard sets of frequently mutated and validated GBM drivers from Parsons et al. [] and Vogelstein et al. [], we also could not see much difference from the bulk of the genes. Finally, the three methods were in equally poor agreement with each other as with our method (Additional file : Figure S4). It could be noted that a smaller fraction of mutations were identified by all the four (including our NEA) methods (35 and 791 in GBM and OV, respectively), but these were mostly the well known cancer drivers.The set of drivers predicted by 1-point-vs-MGS NEA at FDR < 0.1 only exhibited a formal overlap with the results obtained using two tools, GeneSift and PolyPhen-2 (and only in the GBM and not in the OV dataset). The 1-point-vs-MGS NEA agreed quite well with respect to frequency: frequent mutators were classified as drivers much more often than single-case mutators (5 - 32% compared to 3 - 5% based on the pooled GBM and OV results). As explained above, the sequence based tools did not show such agreement.There were still remarkable exceptions. Many known drivers such as TP53, PTEN, RB1 etc. did not receive any significant 1point-vs-MGS scores in genomes with few point mutations. On the other hand, these genes were detected by 1-vs-CPW analysis. As another example, NEA missed nearly all of the mutations in IDH1, an enzyme with acknowledged involvement in GBM [], because of its unique global role in DNA methylation, which was not reflected in our network.Further, despite the poor correspondence between the sequence tools and NEA, the results obtained with the latter were in good agreement with the basic variant classification data available from the TCGA maf files. The strengths of the differences between silent mutations and those classified as either “missense” or “nonsense” increased with the confidence of the NEA analysis: mutations with 1-point-vs-MGS NEA z-scores of > 10 exhibited the most significant overlap with the “missense OR nonsense” category (p-values of p < 0.01 according to Fisher’s exact test for GBM and p < 0.00001 for OV; Additional file : Figure S5).We conclude that sequence-based methods are likely to yield very high positive rates, which seems especially counter-intuitive when whole-exome mutations sets are analyzed (such as the OV set). Moreover, they are only consistently efficient for a few well-characterized genes such as EGFR and TP53. However, NEA also has some significant limitations, including its inapplicability to rare mutations in small mutation sets and genes with very special roles (e.g. those that extend beyond well-characterized signaling pathways). […]

Pipeline specifications

Software tools Mutationassessor, PolyPhen, SIFT
Application WES analysis
Diseases Glioblastoma, Neoplasms, Ovarian Neoplasms