Computational protocol: Radiogenomic Analysis of Oncological Data: A Technical Survey

Similar protocols

Protocol publication

[…] Particular interest in oncology is focused on the field of transcriptomics for the identification and quantification of the RNA in cells, tissues, or biological fluids, representing a powerful tool for the assessment of specific biological activities []. In a single RNA-Seq experiment, it is possible to investigate, not only gene expression, but also alternative splicing [], novel transcripts [,], allele specific expression [], gene fusions [], and genetic variations [,]. Moreover, RNA-Seq can provide more interesting information regarding transcriptome dynamics, such as RNA editing, small insertions/deletions, exon connections, non-coding RNAs, and small RNAs []. Today, there are three widely-accepted, commercially-available NGS devices for RNA-Seq: 454 GS FLX (up to 400 bp) (Roche, Basel, Switzerland), Genome Analyzer II, with paired-end reads up to 100 bp (Illumina, San Diego, CA, USA), and SOLiD (up to 35–50 bp) (Applied Biosystems, Foster City, CA, USA) [,]. Although each platform works differently, they are all based on similar principles: Shearing target nucleic acids into small pieces, binding individual molecules to a solid surface, amplifying each molecule into a cluster, copying one base at a time, and detecting different signals for each nucleotide base. The majority of the platforms only allow for the sequencing of DNA molecules. Therefore, RNA molecules are, first, reverse transcribed into cDNA. Once reverse-transcription is complete, the RNA molecule is removed. At the end of the entire process, the result is a sequence of images, where each lighted spot corresponds to a cluster and the color of each cluster represents a different base type [].The first step of RNA-Seq data analysis is the quality control of the raw reads, particularly the determination of sequence quality, GC content, overrepresented k-mers, and duplicated reads, in order to detect sequencing errors, PCR artifacts, or contamination []. An important indicator of sequencing quality/accuracy and absence of contaminating DNA is represented by the percentage of mapped reads [,]. After quality control, NGS-data analysis is performed by mapping the sequence reads. Indeed, reads are aligned to a reference genome, or to reference transcripts, or assembled de novo without a referenced genomic sequence to produce a genome-scale transcription map, consisting of the transcriptional structure and or the expression level for each gene. When a reference genome is available, RNA-Seq analysis involves the mapping of the reads onto the reference genome or transcriptome, even with the limitation of discovery new transcripts. If an organism does not have a sequenced genome, a de novo assembling approach to produce a genome-scale transcriptional map is necessary [,,,]. For the identification of novel transcripts, several software packages and algorithms are used to assess splice junctions and transcription start and end sites [,,,,,]. Gene-finding prediction tools, such as Augustus [], can exploit RNA-Seq data to better annotate protein-coding transcripts [].The second phase of RNA-Seq analysis provides the quantification of transcript expressions using programs such as HTSeq-count [] or feature Counts [], based on the aggregation of the raw counts of mapped reads. Quantitative gene expression data from RNA-Seq have been shown to be comparable to those of microarrays, but with a better dynamic range and lower detection limit for low-expressed transcripts [] ().RNA-Seq is also used to study the biological role and signature of small RNAs (sRNAs). Although sRNA-Seq libraries are rarely sequenced as deeply as classical RNA-Seq libraries, and bioinformatics analysis is different from standard RNA-Seq protocols, obtained sRNA reads are aligned to a genome or transcriptome reference using bioinformatic tools, such as Bowtie2 [], STAR [], or Burrows-Wheeler Aligner (BWA) [].Innovations in RNA-Seq have made quantitative transcriptome analysis of a single cell possible, even when RNA-Seq is performed on a large number of cells in the same run []. Furthermore, methods that integrate DNA whole exome sequencing (DNA-WES), or Chip-Seq, with RNA-Seq have allowed increased mutation detection performance [,,]. Finally, an in situ method of RNA-Seq has also been developed for preserved tissue sections or cell samples [].RNA-Seq offers several advantages compared with other transcriptomics methods [,], providing high-throughput solutions for the construction of single-base resolution expression profiles with low background noise and a low amount of required starting RNA. Furthermore, it can generate millions of reads in a single run. Nevertheless, few studies used the RNA-Seq technology in combination with radiomic technologies (MRI) to address clinical issue, are reported in . Indeed, although RNA-Seq provides results that are superior to microarray analysis, in terms of sensitivity, specificity, and abundance estimation, microarrays are still used more than RNA-Seq. This is probably due to costs, run time, and the large volume of data, that make it necessary to dedicate platforms to data storage (Big Data). In light of this, although RNA-Seq is promising, technological improvements for reducing costs, improving data processing/storage, and gold standards for analyses are necessary to best use this powerful platform in research laboratories and clinics. […]

Pipeline specifications

Software tools HTSeq, Bowtie2, STAR, BWA
Application RNA-seq analysis
Diseases Neoplasms