Computational protocol: Identification of 42 Genes Linked to Stage II Colorectal Cancer Metastatic Relapse

[…] Gene expression microarray data of colon cancer on U133A or U133Plus2 platform (Affymetrix, Santa Clara, CA, USA) were downloaded from Gene Omnibus (GEO), including synchronous and metachronous liver metastases from CRC (GSE10961, n = 18), primary colorectal tumors (GSE13067, n = 74), primary CRCs (GSE13294, n = 155), primary CRCs (GSE14333, n = 290), colon adenomas and CRCs (GSE15960, n = 12, normal = 6), CRCs (GSE17536, n = 177 of which 144 are stage II and III), metastatic CRCs (GSE17537, n = 55), stage II CRCs (GSE18088, n = 53), stage II and III CRCs (GSE18105, n = 77, normal = 34), colon adenomas and CRCs (GSE20916, n = 101, normal = 44), CRCs (GSE23878, n = 35, normal = 24), MSI CRCs (GSE24514, n = 34, normal = 15), MSI CRCs (GSE26682, n = 331), stage II and III CRCs (GSE31595, n = 37), primary stage II CRCs (GSE33113, n = 90), primary CRC tumors (GSE35896, n = 62), serrated and conventional colorectal adenocarcinoma tumors (GSE4045, n = 37), metastatic CRCs (GSE5851, n = 80), colorectal adenomas (GSE8671, n = 32, normal = 32), and early stage CRC tumors (GSE9348, n = 70, normal = 12). Robust Multichip Average normalization was performed on each dataset using R version 2.15.3, Bioconductor Affy package version 1.38.1 (Affymetrix, Santa Clara, CA, USA). The normalized data was compiled and subsequently standardized using ComBat to remove batch effects []. The standardized data yielded a meta-cohort of 1820 colon carcinoma, and 167 normal colon tissues. Note that some of the genes are only available on Affymetrix U133Plus2 platform (n = 1436), a subset of the meta-cohort. To focus on the effect of tumor suppressor genes, we co-analyzed different combinations of copy number deleted genes in relation to their effect on OS and DFS. Deleted gene combinations depended on their co-presence of their probes in any of the Affymetrix platforms used, and their co-underexpression in a given CRC sample. We stratified expression levels into quartiles (Q) where the first quartile (Q1) is the expression level at the 25th percentile; second quartile is the median or the 50th percentile; the third quartile is the 75th percentile; and, the last quartile (Q4) is the expression levels of the highest 25 percentile. […]

Pipeline specifications

Software tools affy, ComBat
Application Gene expression microarray analysis
Organisms Homo sapiens
Diseases Colonic Neoplasms, Neoplasms, Colorectal Neoplasms, Genetic Diseases, Inborn