Computational protocol: Clinical relevance of the transcriptional signature regulated by CDC42 in colorectal cancer

Similar protocols

Protocol publication

[…] Differential gene expression induced through genetic manipulation of CDC42 levels in SW620 cell line was determined using Human19K Oligo Array from Center for Applied Genomics (University of Medicine of New Jersey). Microarrays hybridization protocol, signal detection and data normalization description have been previously reported []. The experiments were made using biological replicates (two different sets of clones for each genetic modification) and also experimental replicates (performing the whole microarrays twice). Specifically, genes differentially expressed were identified by using a fold-change cut-off of 1.5 in both groups of cells overexpressing CDC42 (CDC42ov) or with silenced CDC42 expression (CDC42i) when compared to the parental SW620 cell line and then opposite differential expression between CDC42ov and CDC42i ().Gene Set Enrichment Analysis (GSEA) was performed against the Molecular Signatures Database v4.0 (MSigDB) curated gene sets (C2) and GO gene sets (C5) Collections []. Enrichment was assessed by hypergeometric testing as implemented in the R stats package.Ingenuity Pathways analysis software (IPA, Ingenuity Systems, was used to integrate the most significant biological pathways regulated by CDC42. A list of 190 differentially expressed genes was created (p-value < 0.05, fold change > 1.5 & < −1.5). This dataset containing gene identifiers and corresponding fold change was uploaded to define the functional networks of differentially expressed genes. The analysis of the 190 genes showed 24 genes of unknown function and the remaining 166 genes were further analyzed. [...] Processed RNA-seq data and clinical data for The Cancer Genome Atlas (TCGA) Rectum Adenocarcinoma (READ) and Colon Adenocarcinoma (COAD) [] were obtained through the NIH Genomic Data commons data portal. Processed RNA-seq expression data was available for 628 tumors of which 460 had available clinical information. For the platform comparison between microarray and RNAseq data, Human19K Oligo Array annotation to ensemble gene ID was successful for 171 genes out of the original 190 genes. Differential expression analysis was performed using processed HTSeq counts data in R using the package DESeq2 [].Survival analysis was performed on the set of patients that had available clinical information and were aged < 90 years, leaving 453 patients. Stratification of patients into high and low expression groups was performed using upper quartile normalized FPKM values (obtained from TCGA) such that high expression patients were defined as having gene expression above the cancer population median. The high-risk population was then identified by taking the intersect of the CDC42 high expression and CACNA2D2 low expression, LARS2 low expression and REG1CP high expression groups, respectively. Analysis of survival was performed using a Cox proportional hazards model as implemented in the R package survival [, ]. Analysis was performed with membership in high/low expression groups as the explanatory variable (univariate) and age, T classification, evidence of venous and or lymphatic invasion, gender, tumor type (Colon or Rectum) and stage (multivariate). Tumor stage was collapsed to just 2 categories of Stage 1 & 2 and Stage 3. […]

Pipeline specifications

Software tools HTSeq, DESeq2
Databases TCGA Data Portal GDC
Application RNA-seq analysis
Organisms Homo sapiens
Diseases Neoplasms, Colorectal Neoplasms