Computational protocol: Fetal DNA Methylation Associates with Early Spontaneous Preterm Birth and Gestational Age

[…] For each subject, >485,000 CpG sites across the genome were interrogated using the HumanMethylation450 BeadChip (Illumina, San Diego, CA) , . Briefly, 1 ug of DNA was converted with sodium bisulfite, amplified, fragmented, and hybridized on the HumanMethylation450 BeadChip (Illumina, San Diego, CA) according to the manufacturer’s instructions. CpGassoc was used to perform quality control and calculate ß values. Data points with probe detection p-values >.001 were set to missing, and CpG sites with missing data for >10% of samples were excluded from analysis; 483,830 CpG sites passed the above criteria. Samples with probe detection call rates <90% and those with an average intensity value of either <50% of the experiment-wide sample mean or <2,000 arbitrary units (AU) were excluded from further analysis. One sample of male DNA was included on each BeadChip as a technical control throughout the experiment and assessed for reproducibility using the Pearson correlation coefficient, to ensure that Pearson correlation coefficient >0.99 for all pairwise comparisons of technical replicates. For each individual sample and CpG site, the signals from methylated (M) and unmethylated (U) bead types were used to calculate a beta value as ß = M/(U+M). [...] We used MethLAB to test for association with PTB via linear regressions that modeled β-values as the outcome and PTB as the independent variable, adjusting for GA, gender, chip, and row on the chip. Based on previous reports and the potential contribution to PTB we examined the association of birth weight percentile, gravidity, parity, infection and smoking as confounding factors in our analysis; these factors did not associate with methylation of any CpG site after adjustment for multiple testing (FDR<.05; data not shown). Birth weight percentile was based on estimated gestational age (GA) in accordance with the United States national registry . We subsequently used MethLAB to fit similar linear regressions that modeled GA as the independent variable, adjusting for gender, chip, and row on the chip. Because it has been suggested that logit-transformed β values (a.k.a. M values) may perform better in statistical analyses , we also examined associations with M values using the strategy described above. Because there was no significant difference between the results, we present results based on untransformed β to ease biological interpretation.The location of each CpG site was determined using the Illumina array annotation for the HumanMethylation450 BeadChip based on build 37 of the human genome. We tested for enrichment among GA-associated sites by comparing the number of GA-associated CpG sites that did or did not occur in a particular gene region (e.g. promoter, 5′UTR, Body, 1st exon, 3′UTR, or intragenic regions) to the number of non-GA-associated sites that did or did not occur in that gene region, using Fisher’s exact test. We then performed similar tests of enrichment for CpG-rich regions defined as islands or CpG poor regions defined as shores , . CpG sites with 1000 Genomes Project variants physically contained within the Illumina probe were noted in the analyses but not excluded a priori. In addition we examined whether significant GA-associated CpG sites were enriched or depleted on the X chromosome using Fisher’s exact test.We used GSEAPrerank , to evaluate whether GA-associated CpG sites were located in genes that were enriched for specific biological processes and cellular components. Significance of the gene ontology enrichment was corrected for an FDR<.05 following 1000 permutations. […]

