Computational protocol: Integrative multi-platform meta-analysis of gene expression profiles in pancreatic ductal adenocarcinoma patients for identifying novel diagnostic biomarkers

Similar protocols

Protocol publication

[…] All data processing and integration procedures were performed using the R statistical programming language. The data were integrated by adapting the scheme from Turbull et al. [] with particular stages for the analysis of PDAC data in Affymetrix and Illumina. The workflow of the proposed meta-analysis is shown in . More specifically, hybridization data from Affymetrix (Cohort 1) were first normalized using Robust Multi-array Average (RMA) analysis from the Bio-conductor R package oligo []. In the same way, Illumina expression data (Cohort 2) was pre-processed by applying Quantile Normalization (QN) from the R package lumi []. In both cases, genes with low variability expression values were discarded to reduce false-positive rates.Data from both platforms were integrated with the virtualArray software R package []. This software allows data from different microarray platforms to be merged by considering several batch effect removal and cross-platform correction methods. Specifically, the data were integrated using the empirical Bayes method (ComBat) []. The ComBat method merges the information from several genes with similar expression distributions in each dataset to estimate the average and variance in each of those genes []. From the integrated data, those genes most likely to be differentially expressed in PDAC patients versus controls were selected by analyzing the gene expression microarray data with the linear models for microarray data (limma) software package []. The R script for the integrative meta-analysis is included as .To validate the selected genes as PDAC biomarkers, a leave-one-out cross-validation (LOOCV) was performed with them. In this validation, one sample is consecutively discarded from the initial dataset, leaving a temporary training set and one left-out sample (test sample). This validation procedure is extensively used to assess a prediction model when no validation dataset is available.Finally, we performed a GO enrichment analysis over the set of newly discovered genes after meta-analysis. For this purpose, an enrichment test using the Kolmogorov-Smirnov (KS) statistical test was carried out from topGO Bioconductor-R package. This analysis identified those biological functions and process that are shared by the differentially expressed genes. […]

Pipeline specifications

Software tools lumi, virtualArray, ComBat, limma, TopGO
Application Gene expression microarray analysis
Organisms Homo sapiens
Diseases Neoplasms, Carcinoma, Pancreatic Ductal