Computational protocol: Transcriptome-wide analysis of alternative RNA splicing events in Epstein-Barr virus-associated gastric carcinomas

Similar protocols

Protocol publication

[…] Detailed information on the GC samples can be obtained from the original manuscript describing the comprehensive evaluation of 295 primary gastric adenocarcinomas as part of TCGA project []. Essentially, each frozen primary tumour specimen had a companion normal tissue specimen []. Adjacent non-tumour gastric tissue was also submitted for a subset of cases []. Pathology quality control was performed on each tumour and adjacent normal tissue (if available) []. Hematoxylin and eosin (H&E) stained sections from each sample were subjected to pathology analysis to confirm that the tumour specimen was histologically consistent with gastric cancer and the adjacent tissue specimen contained no tumour cells [].RNA-Seq samples from TCGA were obtained through the CGHub data portal (https://cghub.ucsc.edu/). Since only BAM files were available, a custom script was used to generate valid FASTQ files. The sequence reads were then aligned on the transcriptome reference sequence database UCSCGene Hg19 using Bowtie v2 aligner (default parameters). The associated gene isoforms were quantified in transcript-per-million (TPM) using RSEM for each sample [,]. RSEM utilizes an Expectation-Maximization (EM) algorithm as its statistical model which allows reads mapping to multiple transcripts to be included in the quantification. Alternative splicing events were automatically identified and further quantified using the percent-spliced-in (PSI, Ψ) value based on long (L) and short (S) forms of all splicing events present using the equation below: Ψ=LL+SFor each splicing event in one given gene (cassette-exon, mutually exclusive exons, alternative 5’ and 3’ splice site, etc), a PSI value was computed based on the ratio of the long form on total form (short form and long form) present to determine the inclusion of exon, intron retention, differential splice-site choice, etc. For instance, the long form of a cassette-exon would be its inclusion, and short form would be its exclusion from the mature transcript. [...] The splicing patterns of selected genes were visualized Using the FAST-DB or EASANA suite. DNA sequences of representative transcripts presenting short and long isoforms were downloaded and translated into proteins using ExPASy translation tool [] Predicted proteins were then compared using Multalin (truncation and frameshift event) [], PFAM (loss or appearance of functional domain) [], and NLS Mapper (loss or gain of nuclear localization signal) []. [...] The PROGgeneV2 prognostic biomarker identification tool [] was used to study the implications of splicing factors gene expression on overall survival of GC patients. The preprocessed dataset from TCGA (including RNA-Seq data and clinicopathological features) was used for analysis. The Cox proportional hazard model was used to calculate hazard ratio and p-value of each parameter, and the median gene expression values were used as bifurcation points. A p-value of less than 0.05 was considered significant. […]

Pipeline specifications

Software tools EASANA, MultAlin, PROGgene
Databases Pfam TCGA Data Portal FAST DB ExPASy
Applications RNA-seq analysis, Transcriptome data visualization
Organisms Homo sapiens
Diseases Neoplasms, Stomach Neoplasms