Computational protocol: Gene-Specific Differential DNA Methylation and Chronic Arsenic Exposure in an Epigenome-Wide Association Study of Adults in Bangladesh

Similar protocols

Protocol publication

[…] Study population. The Bangladesh Vitamin E and Selenium Trial is a 2 × 2 factorial randomized chemoprevention trial evaluating the long-term effects of vitamin E and selenium supplementation on nonmelanoma skin cancer risk (). Participants were residents of rural communities in central Bangladesh. Eligibility criteria included age between 25 and 65 years, permanent residence in the study area, manifest arsenical skin lesions, and no prior cancer history. Between April 2006 and August 2009, a total of 7,000 individuals were enrolled into the study. Trained study physicians, blinded to participants’ arsenic exposure, conducted in-person interviews and clinical evaluations, and collected urine and blood samples from participants in their homes using structured protocols. Of participants enrolled in the study, 413 were randomly sampled for epigenome-wide methylation analyses; baseline biological specimens collected before the start of the trial intervention were used in the analyses.The study protocol was approved by the relevant institutional review boards in the United States (The University of Chicago and Columbia University) and Bangladesh (Bangladesh Medical Research Council). Informed consent was provided by participants prior to the baseline interview of the original study.Exposure assessment. Urinary total arsenic concentration was measured in the baseline spot urine sample by graphite furnace atomic absorption spectrometry (AAnalyst 600 spectrometer; PerkinElmer, Norwalk, CT, USA) with a detection limit of 2 μg/L, in a single laboratory (Trace Metal Core Laboratory at Columbia University) (). Urinary creatinine was also measured for all participants in the same laboratory by a colorimetric method based on the Jaffe reaction (). Urinary total arsenic was divided by creatinine to obtain a creatinine-adjusted urinary total arsenic concentration, expressed as micrograms per gram creatinine. Creatinine-adjusted urinary total arsenic, a good biomarker of aggregate ingested arsenic exposure, captures exposure from all sources including water, food, soil, and dust ().Venous whole blood samples collected at baseline were analyzed for blood arsenic concentration by inductively coupled plasma mass spectrometry (ICP-MS) using a PerkinElmer Elan DRC (dynamic reaction cell) II equipped with an AS 93+ autosampler (PerkinElmer). ICP-MS-DRC methods for metals in whole blood were developed according to published procedures (; ), with modifications for blood sample preparation as suggested by the Laboratory for ICP-MS Comparison Program (Institut National de Sante Publique du Québec).DNA methylation. DNA was extracted using DNeasy Blood kits (Qiagen, Valencia, CA, USA), and bisulfite conversion was performed using the EZ DNA Methylation Kit (Zymo Research, Irvine, CA, USA). DNA methylation was measured in 500 ng of bisulfite-converted DNA per sample using the Illumina HumanMethylation 450K BeadChip kit (Illumina, San Diego, CA, USA) according to the manufacturer’s protocol; this beadchip allows interrogation of 485,577 CpG sites per sample. The methylation score for each CpG site, represented as the β value, on a continuous scale between 0 (unmethylated) and 1 (completely methylated) was quantile normalized. Among the 413 participants for whom DNA methylation data were generated, we excluded 6 samples for which the reported sex of the participant did not correspond with predicted sex based on methylation patterns of the X and Y chromosomes, and 7 samples with > 5% of CpGs either containing missing values or having p for detection > 0.05. This resulted in 400 samples retained for analyses. We omitted individual β values that were associated with a p for detection > 0.05. We also excluded probes on the X (n = 11,232) and Y (n = 416) chromosomes, probes with missing chromosome data (mostly control probes; n = 65), and probes with > 10% missing data across samples (n = 1,932); this resulted in a total of 471,932 probes included in the statistical analyses. Quantile-normalized β values were logit transformed and adjusted for batch variability using ComBat software (). Based on 11 samples run in duplicate across two different plates in these experiments, the average interassay Spearman correlation coefficient (rs) was 0.987 (range, 0.974–0.993).Gene expression. Mononuclear cells were preserved in Buffer RLT and stored at –80°C; RNA was then extracted using the RNeasy Micro Kit from QIAGEN (Valencia, CA, USA). The concentration and quality of RNA was checked on a Nanodrop 1000 spectrophotometer (Thermo Scientific, Wilmington, DE, USA). cRNA synthesis was performed using 250 ng of RNA using the Illumina TotalPrep 96 RNA Amplification kit. Gene expression was measured using the Illumina HumanHT-12-v4 BeadChip utilizing 750 ng of cRNA according to the manufacturer’s protocol. The chip contains a total of 47,231 probes covering 31,335 genes. Quantile-normalized expression values were log2 transformed and adjusted for batch variability using ComBat software (). Gene expression data were available for the 400 individuals included in these analyses.Genotyping. Genotyping procedures have been described in detail previously (). Briefly, DNA extraction was carried out from whole blood using the QIAamp 96 DNA Blood Kit (QIAGEN, Valencia, CA, USA). Any DNA sample with a concentration < 40 ng/μL, a 260 nm/280 nm ratio outside the range of < 1.6 to ≥ 2.1 (measured by Nanodrop 1000), or fragmented DNA < 2 kb (assessed by smearing in Agilent BioAnalyzer) was excluded. Genotyping was performed using the Illumina HumanCytoSNP-12 BeadChip utilizing 250 ng DNA according to the manufacturer’s protocol. Using 257,768 genotyped single nucleotide polymorphisms (SNPs) after quality control procedures, we performed imputation using MaCH on the basis of the HapMap 3 Gujarati Indians in Houston (GIH) population (Build 36;, yielding 1,211,988 SNPs after quality control procedures. Genotype data were available for 393 individuals included in these analyses.Replication study. For replication of our top differentially methylated loci, we examined associations in an independent sample of 48 Bangladeshi adult males from the ongoing Folate and Creatinine Trial (FACT), prior to intervention. No participants in this sample had manifest arsenical skin lesions. The methods used to measure and analyze DNA methylation have been described in detail by . Briefly, the water arsenic concentration was measured for all study participants at Columbia University, as previously described (), and individuals were categorized as having low (50–100 μg/L, n = 25) or high (> 100 μg/L, n = 23) exposure for statistical analyses. After Ficoll separation, DNA from peripheral blood mononuclear cells (PBMCs) was extracted using the 5 PRIME ArchivePure DNA Blood Kit (Fisher Scientific, Pittsburgh, PA, USA). Epigenome-wide methylation of PBMC DNA was measured at the Roswell Park Cancer Institute (Buffalo, NY, USA) using the Illumina HumanMethylation 450K BeadChip kit. Data were processed using a standard quality control protocol and adjusted for batch effects prior to analyses using ComBat (). Here, we report associations from linear regression models.Statistical analyses. For each CpG site, a separate linear regression model was run regressing the logit-transformed β value on continuous arsenic exposure (i.e., blood arsenic or urinary total arsenic concentration), sex, and age. Here, we present model coefficients and SEs from the linear regression models comparing the 75th versus the 25th percentile of each arsenic distribution. To correct for multiple comparisons, we considered a Bonferroni-corrected (p < 1 × 10–7) significance threshold; however, we present results for all loci with p < 1 × 10–5. For differentially methylated probes with p < 1 × 10–5, we used linear regression to examine the association of methylation with corresponding RNA transcript levels of the gene containing the methylation locus, adjusted for sex, age, and urinary total arsenic concentration. Methylation and expression data were adjusted for batch effects prior to analyses using ComBat ().ComBat batch-adjusted methylation data were used to infer white blood cell type fractions using the statistical method of . Briefly, we used 466 methylation probes previously identified to be associated with leukocyte distributions () to infer the distribution of white blood cell types in our study samples. Linear regression was used to evaluate the association between quartiles of arsenic exposure, based on the distribution in our study sample, in relation to the estimated cell type fractions adjusted for sex and age.To evaluate previously reported associations from other studies, we conducted a lookup of individual CpG loci of interest within our data set. For these analyses a CpG locus associated with p < 0.05 was considered to be statistically significant. We used the R program MethLAB v1.5 () and SAS software (SAS Institute Inc., Cary, NC, USA) to run all analyses. […]

Pipeline specifications

Software tools ComBat, MethLAB
Application DNA methylation array analysis
Organisms Homo sapiens
Diseases Drug-Related Side Effects and Adverse Reactions
Chemicals Arsenic