Computational protocol: Differential Gene Expression in Colon Tissue Associated With Diet, Lifestyle, and Related Oxidative Stress

Similar protocols

Protocol publication

[…] Of the 197 initial tumor/non-tumor tissue pairs, five subjects failed quality control (QC) based on low number of sequence counts for both tumor and non-tumor tissue, and 17 were dropped because the non-tumor colonic tissue failed QC, leaving 175 subjects with high quality expression data. Of these 144 had questionnaire data for diet and lifestyle data for inclusion in the analysis. In terms of specific dietary factors, we focused on carotenoids and antioxidant nutrients, including beta carotene, vitamin C, total alpha tocopherol (vitamin E), lycopene, and lutein + zeaxanthin. We also considered alcohol consumption, recent use of NSAIDs, and current smoking. For each dietary and lifestyle factor, our analysis centered on contrasting gene expression levels of individuals with lower intake or exposure levels to those of individuals with higher intake or exposure levels. Each individual’s intake or exposure was assigned a category based on their dietary and lifestyle data. Dietary data were categorized into tertiles [i.e. low (T1), moderate (T2), or high (T3)] based on the empirical distributions in the population. Cigarette smoking was categorized as never, former, or current smoker. Alcohol was categorized into non-drinker, low intake, or high intake for each type of alcohol. Use of NSAIDs (which included aspirin and/or non-steroidal anti-inflammatory drugs) was categorized as either being a recent user (i.e. using NSAIDs during the referent period) or a non-user. To summarize risk associated with multiple exposures, we developed an oxidative balance score (OBS) that consisted of 13 diet and lifestyle factors that were pro-oxidants (dietary iron and polyunsaturated fat and cigarette smoking) and anti-oxidants (vitamin C, vitamin E, selenium, beta carotene, lycopene, lutein+zeaxanthin, vitamin D, calcium, and folic acid and NSAID use) []. To create the OBS, these diet and lifestyle factors were assigned values of 2 for low levels of exposure for each pro-oxidants or high exposure to anti-oxidants (low-risk), one for intermediate levels of exposure, and zero for high levels of exposure to pro-oxidants and low exposure to anti-oxidants (high-risk). The individual scores for the 13 variables were then combined to obtain the OBS. Higher summary score corresponded to greater oxidative balance; individual’s OBSs were categorized as low, intermediate, or high based on tertiles associated with the empirical distribution of the OBSs.For each variable of interest (specific dietary factors, NSAIDS use, smoking and OBS), we assessed which genes displayed statistically significant differential expression between low and high categories using the Bioconductor package DESeq2 written for the R statistical programming environment. DESeq2 assumes the RNA-seq counts are distributed according to negative binomial distributions. It utilizes generalized linear modeling to test individual null hypotheses of zero log2 fold changes between high and low categories (i.e. no differential expression) for each gene and it employs both an independent-filtering method and the Benjamini and Hochberg [] procedure to improve power and control the false discovery rate (FDR). For further details regarding DESeq2, see Love et al. []. In identifying genes with significant differential expression, an FDR of 0.10 was used.To help describe the data, we report the average DESeq2-adjusted gene expression levels (adjusted counts) among individuals in the high and low categories of the dietary or lifestyle variables of interest for each differentially expressed gene and include fold change calculations associated with these genes. Included as a descriptive detail rather than reflecting direct DESeq2 output, fold change was calculated as the ratio of a gene’s mean expression among individuals in the high category of a dietary or lifestyle variable to its mean expression among individuals in the low category; a fold change greater than one indicates a positive differential expression (i.e. up-regulated) while a fold change between zero and one indicates a negative differential expression (i.e. down-regulated).To visualize differential gene expressions between individuals in high and low categories of related diet and lifestyle variable groups, we created heat maps. Each heat map features the log2 transformation of the fold changes, calculated as described above, associated with genes identified as significantly differentially expressed between high and low categories of the diet and lifestyle variables considered for the specific heat map. Our heat maps were created using the heatmap.2 program in the ‘gplots’ package of R (http://cran.r-project.org). Distance between two vectors of log2 transformed fold changes was measured via the Euclidean metric and median linkage was selected for this programs’ agglomerative hierarchical clustering algorithm.Bioinformatics analysis was performed on the list of Ensemble IDs associated with genes identified as differentially expressed with QIAGEN’s Ingenuity Pathway Analysis (IPA) []. We used genes from Ingenuity Knowledge Base and considered both indirect and direct relationships. Networks were limited to 35 molecules and 25 networks per analysis and included both causal and interaction networks. We included all data sources in our IPA assessment, but did not restrict to species or specific tissue when compiling networks. We applied the Benjamini-Hochberg (B-H) multiple testing correction to assess pathways in IPA. […]

Pipeline specifications

Software tools DESeq2, gplots, IPA
Application RNA-seq analysis
Organisms Homo sapiens
Diseases Colonic Neoplasms
Chemicals Ethanol, Aspirin, Ibuprofen, Vitamin E, beta Carotene