Computational protocol: Identification of a candidate prognostic gene signature by transcriptome analysis of matched pre- and post-treatment prostatic biopsies from patients with advanced prostate cancer

Similar protocols

Protocol publication

[…] The FastQC package ( was used to assess the quality of raw reads, which were then mapped to human genome assembly hg19 using TopHat version 1.4.1 [] with a junctions library derived from Ensembl version 68. Quality control was performed on all samples by examining the following parameters: (a) the percent of reads uniquely mapping to the genome; (b) the percent of reads mapping to known protein coding sequence; (c) the number of exon junctions identified; (d) the percent of spliced reads; and (e) the number of genes with 90% base coverage (Additional file : Table S1). TopHat-Fusion version 0.1.0 [] was used to identify gene fusions. HTSeq version 0.5.3 ( was used to identify differentially-expressed genes by counting the number of reads mapping to each gene from Ensembl version 68. The TMM method was used to normalise read counts and differential expression tested for using a paired generalized linear model design with the Bioconductor version 2.11 edgeR package []. The Circos plot was generated using RCircos version 1.1.2 []. Correlations were identified using Pearson’s product moment correlation coefficient (p < 0.05). Enriched KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways [] were identified by downloading gene pathways associations and testing each pathway for enrichment in significantly up- and down-regulated genes (FDR < 0.05) with a transcript length-corrected Wallenius approximation as implemented by the GOSeq package for Bioconductor 3.0 []. Pathways were deemed to be enriched if the enrichment over background was at least 2-fold and the FDR < 0.05. Gene lists were uploaded to cBioPortal ( [, ] to study gene expression changes in all prostate tumours with mRNA expression data (n = 150) from the Memorial Sloan Kettering Cancer Center (MSKCC) Prostate Oncogenome Project dataset [] using a mRNA Z-score threshold of ± 1.6 as compared with normal prostate samples. Genes altered in a significant number of tumours (>25%) were considered for associations with disease-free survival though the cBioPortal software using the Kaplan–Meier method with log rank testing with p < 0.05 taken to indicated statistical significance. Raw sequencing data have been deposited at Gene Expression Omnibus ( under accession number GSE51005 and all details are MIAME compliant. […]

Pipeline specifications

Software tools FastQC, TopHat, TopHat-Fusion, HTSeq, edgeR, RCircos, GOseq, cBioPortal
Databases GEO KEGG
Applications RNA-seq analysis, Genome data visualization
Organisms Homo sapiens
Diseases Neoplasms, Prostatic Neoplasms