Computational protocol: Exome Sequencing Identifies Potentially Druggable Mutations in Nasopharyngeal Carcinoma

Similar protocols

Protocol publication

[…] Fastq files of the whole exome sequencing were obtained from BGI for further analysis. Sequencing reads from tumor and matched normal blood DNAs were separately aligned to the human reference genome hg19 using Burrows-Wheeler Aligner. Local realignment was performed using Genome Analysis Toolkit (GATK), followed by PCR duplicates removal using Picard tool ( Next, variants from both normal and tumor samples were identified using GATK pipeline. Briefly, base qualities were recalibrated, and the GATK UnifiedGenotyper was subsequently employed to call SNVs and Indels. Only well-mapped reads (mapping quality of ≥30 and number of mismatches ≤3 within a 40-bp window) were used as input for the UnifiedGenotyper. Variants that passed additional quality filters (quality by depth of ≥1.5, variant depth of ≥2, total depth ≥10) were retained. To identify somatic mutations (single nucleotide variants [SNVs] and short insertions and deletions [Indels]), variants identified from tumor and matched blood DNA samples were initially compared with dbSNP137, 1000 Genomes Project, Complete Genomic Project (cg69), and National Heart, Lung and Blood Institute (NHLBI; NIH, USA) databases to eliminate any previously reported polymorphisms. Somatic mutations were then identified essentially by subtracting variants in normal DNA samples from the tumor samples and subjected to annotation using ANNOVAR based on NCBI Refseq database. Only somatic mutations in exons or in splice sites were further analyzed. All potential somatic mutations identified were manually inspected by using the Integrative Genomics Viewer. Amino acid changes were annotated to the longest transcript of the gene, and the impact of somatic SNVs on protein function was predicted using SIFT and PolyPhen2 (). The putative non-synonymous mutations were analyzed for enriched functional groupings (Gene Ontology classification) using the Database for Annotation Visualization and Integrated Discovery (DAVID). The list of genes identified as harboring SNVs and Indels were further analyzed and a subset was subsequently selected for external validation using targeted sequencing of the DNA samples from the Prevalence set (described below). [...] From our Discovery set a total of 30 candidate mutations, were selected for validation by Sanger sequencing (). For the Prevalence set, recurrent variants and top 5 most frequently mutated genes were selected for Sanger validation. Somatic or germline status was also determined by Sanger sequencing, where matched blood DNA was available (). Primers specific to the regions of interest harboring the mutations were designed using Primer 3 software ( and these sequences are listed in . Due to sample limitation, DNA samples from TSH were whole genome amplified using Repli-G Mini kit (Qiagen, GmbH, Germany) prior to use for further PCR amplification for Sanger sequencing. PCR amplification was conducted using Platinum Supermix (Invitrogen, CA, USA) for samples from TSH, or GoTaq Green Master Mix (Promega, Madison USA) for samples from KLH, PH, SGH and QESH. PCR cycling parameters included one cycle at 95 °C for 5 min, 40 cycles at 95 °C for 30 s, 55 °C to 60 °C for 30 s and 72 °C for 1 min, and one cycle at 72 °C for 10 min. Sequencing was performed with ABI BigDye Terminator v3.1 (Life Technologies, CA, USA). The sequence chromatograms were visually inspected with Mutation Surveyor v4.0.4 (Softgenetics, State College, USA), Chromas Lite 2.1.1 or Bioedit software. […]

Pipeline specifications

Software tools Primer3, BioEdit
Application qPCR
Diseases Carcinoma, Neoplasms