Dataset features


Application: Gene expression microarray analysis
Number of samples: 124
Release date: Dec 12 2011
Last update date: Oct 29 2018
Access: Public
Diseases: Hodgkin Disease, Lymphoma, Non-Hodgkin, Neoplasms, Neural Tube Defects, Leukemia, Myeloid, Acute, Neoplasms, Second Primary
Dataset link Altered Hematopoietic Cell Gene Expression Precedes Development of Therapy-Related Myelodysplasia and Identifies Patients at Risk

Experimental Protocol

PBSC samples obtained pre-aHCT and BM samples at the time of development of t-MDS/AML post-HCT were studied. The training set consisted of 18 patients who developed t-MDS/AML (”cases”) after aHCT, matched with 37 controls who underwent aHCT, but did not develop t-MDS/AML. One to three controls were selected per case, matched for primary diagnosis (HL/NHL), age at aHCT (±10years), and ethnicity (Caucasians, African-Americans, Hispanics, other). The length of follow-up after aHCT for controls was longer than the time to t-MDS/AML in the corresponding case. The results of the training set were validated in an independent group of 36 patients (test set) consisting of 16 cases that developed t-MDS/AML post-aHCT and 20 matched controls. In the test set, 55 PBSC samples from 18 cases and 37 matched controls were studied. BM samples from time of development of t-MDS/AML were available for 12 cases, and from 21 matched controls obtained at a comparable time from aHCT. For validation, 36 PBSC samples from 16 cases and 20 matched controls were studied. All samples had been cryopreserved as mononuclear cells. After thawing, samples were labeled with anti-CD34-APC and anti-CD45-FITC and CD34+CD45dim cells were selected using flow cytometry. Total RNA was extracted using the RNeasy kit. RNA from 1000 cells was amplified and labeled using GeneChip® Two-Cycle Target Labeling and Control Reagents from Affymetrix. 15 µg of cRNA each was hybridized to Affymetrix HG U133 plus 2.0 Arrays. Microarray data were analyzed using R (version 2.9) with genomic analysis packages from Bioconductor (version 2.4). Data for PBSC and BM samples were normalized separately using robust multiarray averages with consideration of GC content (GCRMA). Probesets with low expression or variability were filtered. Expression of genes represented by multiple probesets was set as the median of the probesets. Using conditional logistic model (CLM) to retain matching between cases and controls, we analyzed the magnitude of association [expressed as odds ratio (OR)] between t-MDS/AML and i) gene expression levels in PBSC at the pre-aHCT time point; ii) gene expression levels in BM at time of t-MDS/AML; and iii) change of expression of individual genes from PBSC to time of t-MDS/AML. False discovery rate (FDR) was applied to adjust for multiple testing. Gene set enrichment analysis (GSEA) was performed on ranked lists of genes differentially expressed between cases and controls. Where multiple significant gene sets were related to each other, analysis was performed to identify a subset of common enriched genes. Average gene expression was calculated for each set and heatmaps plotted to show the contrasts between cases and controls. Gene Ontology (GO) and pathway analysis was performed using DAVID 2008 and Ingenuity IPA 7.5 respectively, retaining genes with z-scores ≥1.8 or ≤-1.8, and ≥1.5-fold change in OR between cases and controls. The association between gene expression in the PBSC product and subsequent development of t-MDS/AML identified in the training set was validated in an independent test set of 36 PBSC sample procured from patients who developed t-MDS/AML after aHCT (16 cases) or did not (20 controls). Pre-processing, normalization and filtering procedures for the test set were identical to the training set. Differential expression between cases and controls was analyzed using CLM. GSEA analysis was performed on the ranked list of differentially expressed genes. Prediction analysis of microarray (PAM) was used to derive a prognostic gene signature from the training set to classify patients as case or control. PAM uses the “nearest shrunken centroid” approach and 10-fold cross-validation to select a parsimonious gene expression signature that can classify samples with minimal misclassification. PAM was applied to genes common to both datasets. Based on the misclassification error in cross-validation, a 63-gene signature was selected for prediction using the test data.










Ravi Bhatia