Similar protocols

Protocol publication

[…] Tumors were profiled for genomic alterations in 410 key cancer-associated genes using our custom, deep sequencing MSK-IMPACT assay. Custom DNA probes were designed for targeted sequencing of all exons and selected introns of 410 oncogenes, tumor suppressor genes, and members of pathways deemed actionable by targeted therapies. Genomic DNA from the tumor, extracted on a Hamilton Chemagic workstation using formalin-fixed paraffin-embedded tissue DNA kits (Perkin Elmer), was subjected to sequence library preparation and exon capture (NimbleGen). Up to 30 barcoded sequence libraries were pooled at equimolar concentrations and input into a single exon capture reaction, as previously described []. Pooled libraries containing captured DNA fragments were subsequently sequenced on the Illumina HiSeq 2500 system as 2 × 100 bp paired-end reads. Sequence data were demultiplexed using BCL2FASTQv1.8.3 (Illumina), and vesitigial adapter sequences were removed from the 3′ end of sequence reads. Reads were aligned in paired-end mode to the hg19 b37 version of the genome using BWA-MEM (Burrows-Wheeler Alignment tool). Local realignment and quality score recalibration were performed using Genome Analysis Toolkit (GATK) according to GATK best practices []. Samples were subjected to a series of computational quality control steps to ensure genomic concordance between tumor and normal specimens from control group of normal individuals, detect the presence of tumor DNA in the normal sample, and monitor contamination involving DNA from different patients. Unpaired-sample variant calling was performed on tumor sample and control normals to identify point mutations/single nucleotide variants (SNVs) and small insertions/deletion (indels). MuTect (version 1.1.4) was used for SNV calling and SomaticIndelDetector, a tool in GATKv.2.3.9, was used for detecting indel events. Variants were subsequently annotated using Annovar, and annotations relative to the canonical transcript for each gene (derived from a list of known canonical transcripts obtained from the UCSC genome browser) were reported. Since this tumor was without a matched normal sample, variant calling was performed as: variants with minor allele frequency > 1% in the 1000 Genomes cohort were also removed as they were more likely to be common population polymorphisms than somatic mutations. Annotated SNV and indel calls were subjected to a series of filtering steps to ensure only high-confidence calls were admitted to the final step of manual review. First, prior knowledge from the literature was incorporated in the analysis through a ‘two-tiered’ variant filtering scheme: variants corresponding to known hotspot mutations with extensive supporting evidence in the literature (at least 5 mentions in the COSMIC database) were considered ‘first-tier’ events. These variants were subjected to lower requirements on coverage, number of mutant reads and variant frequency to be considered as high confidence calls. Second, variants detected in more than 20% of a set of historical normal samples (i.e. ≥3 mutant reads and > 1% variant frequency) were considered to be likely artifacts and removed. Third, we employed the following thresholds on coverage depth (DP), number of mutant reads (AD) and variant frequency (VF) for rejecting false positive calls. First-tier variants (i.e. well-characterized hotspot mutations) were considered in a separate class from novel second-tier variants -first-tier variants were filtered using the following criteria: DP ≥ 20X, AD ≥ 8 and VF ≥ 2%, compared to second-tier variants: DP ≥ 20X, AD ≥ 10 and VF ≥ 5%. Variant calls passing these filtering steps and resulting in changes to the protein primary sequence (i.e. non-synonymous: missense and nonsense, splice site, frameshift indel, inframe indel) were subjected to manual review using the Integrated Genomics Viewer (IGV) []. This enabled the elimination of additional likely false positive calls (e.g. variants supported by reads with low mapping quality and/or many low-quality bases) produced by sequencing-induced artifacts. Finally, normal variants were excluded by searching the, a databases annotating human SNPs. […]

Pipeline specifications

Software tools BWA, GATK, MuTect, SomaticIndelDetector, ANNOVAR, IGV
Databases gnomAD UCSC Genome Browser
Application Genome data visualization
Diseases Heart Diseases, Neoplasms, Sarcoma