Computational protocol: Systematic pan-cancer analysis of somatic allele frequency

Similar protocols

Protocol publication

[…] All the datasets were generated through paired-end sequencing on an Illumina HiSeq platform. The human genome reference (hg38)-aligned sequencing reads (Binary Alignment Maps, .bams) and the Simple Nucleotide Variation mutation annotation file (SNV.maf) were downloaded from the Genomic Data Commons Data Portal (https://portal.gdc.cancer.gov/) and processed downstream through an in-house pipeline. The RNA and DNA alignments, together with the variant lists were processed through RNA2DNAlign. RNA2DNAlign produced variant and reference sequencing reads counts for all the variant positions in all four datasets (normal exome, normal transcriptome, tumor exome and tumor transcriptome). Selected read count assessments were visually examined using Integrative Genomics Viewer. We excluded from further analyses variants which (1) were covered with less than 10 sequencing reads in the tumor DNA or the RNA sequencing data; (2) reside in known imprinted regions, and (3) were present in the normal DNA or RNA, suggesting germline origin. Variants positioned in the X Chromosome and on stably imprinted autosomal genes, were excluded from the analyses. For the NMD-analysis, short-living (<1 h half-life) transcripts were identified based on Tani et al.,. The gene expression was quantified using the Cufflinks package from the Tuxedo suite, as we have previously described. [...] Functional annotations and conservation scores were extracted using the SeattleSeq annotation 147 (http://snp.gs.washington.edu/SeattleSeqAnnotation147/index.jsp). Pathogenicity was modeled using PolyPhen, CADD and FATHMM methods, and conservation was assessed based on GERP scores–. Transcription factor binding sites were analyzed using TRANSFAC 7.0. […]

Pipeline specifications

Software tools SeattleSeq Annotation, PolyPhen, CADD, FATHMM, GERP
Databases TRANSFAC
Application WGS analysis
Diseases Neoplasms
Chemicals Nucleotides