Computational protocol: Ethnic and diet-related differences in the healthy infant microbiome

[…] DNA was extracted with a custom DNA extraction protocol described in []. Briefly, 100–200 mg of stool was added to 2.8 mm and 0.1 mm glass beads (MoBio Laboratories Inc., Carlsbad, CA, USA) along with 800 μl of 200 mM sodium phosphate monobasic (pH 8) and 100 μl guanidinium thiocyanate EDTA N-lauroylsarkosine buffer (50.8 mM guanidine thiocyanate, 100 mM ethylenediaminetetraacetic acid, and 34 mM N-lauroylsarcosine). These were then homogenized in a PowerLyzer 24 Bench Top Homogenizer (MoBio Laboratories Inc.) for 3 min at 3000 RPM. Next, two enzymatic lysis steps were performed. First, the sample was incubated with 50 μl of 100 mg/ml lysozyme, 500 U mutanolysin, and 10 μl of 10 mg/ml RNase for 1 h at 37 °C. Next, the sample was incubated with 25 μl 25% sodium dodecyl sulphate, 25 μl of 20 mg/ml Proteinase K, and 62.5 μl of 5 M NaCl at 65 °C for 1 h. Next, debris was pelleted in a tabletop centrifuge at maximum speed for 5 min and the supernatant added to 900 μl of phenol:chloroform:isoamyl alcohol (25:24:1). The sample was then vortexed and centrifuged at maximum speed in a tabletop centrifuge for 10 min. The aqueous phase was removed and the sample run through the Clean and Concentrator-25 column (Zymo Research, Irvine, CA, USA) according to kit directions except for elution, which was done with 50 μl of ultrapure water and allowed to sit for 5 min before elution. The DNA was quantified using a Nanodrop 2000c Spectrophotometer []. Amplification of the bacterial 16S rRNA gene v3 region (150 bp) tags was performed as previously described [] with the following changes: 5 pmol of primer, 200 μM of each dNTP, 1.5 mM MgCl2, 2 μl of 10 mg/ml bovine serum albumin, and 1.25 U Taq polymerase (Life Technologies, Carlsbad, CA, USA) were used in a 50 μl reaction volume. The PCR program used was as follows: 94 °C for 2 min followed by 30 cycles of 94 °C for 30 s, 50 °C for 30 s, and 72 °C for 30 s, then a final extension step at 72 °C for 10 min. DNA extraction and PCR amplification of 16S rRNA gene v3 libraries were found to be reproducible using a set of five samples from each cohort (total of ten samples) that were extracted in triplicate (29 extractions since one extraction failed) and a subset of three extractions from each cohort amplified in triplicate for a total of 41 datasets (Additional file : Figure S1).Illumina libraries were sequenced in the McMaster Genomics Facility with 250-bp sequencing in the forward and reverse directions on the Illumina MiSeq instrument. Custom, in-house Perl scripts were used to process Illumina sequences as previously described []. Briefly, after sequence trimming and alignment, operational taxonomic units (OTU) were clustered using AbundantOTU+ [] with a threshold of 97%. Chimera checking was not done since we have shown that amplification of the short V3 region of the 16S rRNA gene leads to very few genuine chimeric sequences []. Taxonomy for the representative sequence of each OTU was assigned using the Ribosomal Database Project classifier [] with a minimum confidence cutoff of 0.8 against the Greengenes (2013 release) reference database []. All OTUs classified as “Root:Other” (comprising 0.03% of the total reads sequenced) were then excluded as was one sample with <500 sequenced reads; however, singleton OTUs were not excluded. This resulted in a total of 41.4 million reads with a minimum of 2.0 × 103, maximum of 4.3 × 105, and a median of 9.0 × 104 reads per sample.Bacterial community richness and diversity (alpha diversity) were calculated using the estimated species richness and Shannon diversity functions with the vegan package in R [], using OTU abundances. Differences between bacterial communities in each sample (beta diversity) were quantified using the Bray–Curtis dissimilarity measure on relative abundance values of all bacterial genera and principal coordinate analysis was also done using the vegan package or the phyloseq package [] in R. [...] Simple linear regression was used to determine the effect of ethnicity and breastfeeding on alpha diversity estimates. Permutational multivariate analysis of variance on Bray–Curtis dissimilarities of genus level relative abundances, done with the adonis function from the vegan package in R [], was used to examine bacterial community differences associated with ethnicity after adjustment for potential covariates of ethnicity–microbiome associations.Candidate covariates in the multivariable model were informed by the existing literature and assessed formally in univariable models against microbiome diversity (i.e., years mother lived in Canada, breastfeeding at time of collection, time since weaning, formula and cow’s milk use in the first year, time of introduction of solid foods, infant weight gain in the first year, birth weight, infant age at stool collection, and mode of delivery, gestational diabetes, mother’s antibiotic use during pregnancy and labor, and mother’s vegetarian status). Next, the candidate variables chosen above were used to separately predict dissimilarities with the same method as above. Those with p < 0.10 were subjected to a forward stepwise procedure. We then added the most significant covariates into the model in order of the proportion of variance explained, and stopped when the next most significant covariate was above the 0.05 threshold.The association between genus level abundances and ethnicity and/or breastfeeding was determined through a multivariate algorithm adjusting for significant covariates performed with the Maaslin package in R [, ]. Briefly, covariates found to be significant (p < 0.05) predictors of the microbiome (described above) were included into a multivariate boosted, additive general linear model between covariate data and bacterial genus level abundances. P values were adjusted for multiple testing with the false discovery rate, reported as q values, and q < 0.05 was considered significant. Genera with a coefficient of variation >0.001 were included in Additional file : Table S1. […]

Pipeline specifications

Software tools RDP Classifier, vegan, phyloseq, MaAsLin
Databases Greengenes
Applications Metagenomic sequencing analysis, 16S rRNA-seq analysis
Organisms Homo sapiens
Diseases Diabetes Mellitus
Chemicals Lactic Acid