Computational protocol: Use of dietary indices to control for diet in human gut microbiota studies

Similar protocols

Protocol publication

[…] A subset of 2070 individuals were used to assess the extent the variation within and between individuals’ microbiota could be captured by each dietary index. Collection and processing of samples for 16S rRNA gene sequencing for the TwinsUK cohort has been described previously []. Individuals brought samples to clinical visit or posted them in sealed ice packs to the research department where they were stored at − 80 °C, until shipped frozen for analysis. DNA was extracted at Cornell University, where the V4 region of the 16S rRNA genes was amplified. A multiplexed approach was used to sequence the amplicons on the Illumina MiSeq platform. Following demultiplexing, sample read paired-ends were merged using a 200 nt minimum overlap. 16S rRNA gene sequencing data was processed and OTUs generated as described previously []; per sample de novo identification and removal of chimeric sequences was undertaken using USEARCH, and then de novo OTUs were picked in QIIME using SUMACLUST at a similarity threshold of 97% []. The OTU representative sequences were aligned using the parallel_align_seqs_pynast command within QIIME, the resulting alignment was then filtered to remove variable regions using the filter_alignment command, and a phylogenetic tree was created using the make_phylogeny command. All commands were run with the default parameters in QIIME version 1.9.1.Alpha diversity metrics of Shannon diversity, chao1, Simpson’s diversity and observed species were also calculated in Qiime. OTUs were rarefied to 10,000 sequences per sample 50 times, and the 4 alpha diversity metrics were then calculated as the mean for each sample across the 50 rarefied tables. Mixed-effects models were constructed using the “lme4” package in R to assess the extent alpha diversity varied with dietary index; all model variables were scaled prior to input, and all reported coefficients are standardised []. Nested models were used to compare the effect of each dietary index. Models were adjusted for age, BMI, twin zygosity, sex and OTU count per samples, with technical covariates and FFQ questionnaire batch as random effects. As χ2 values resulting from ANOVA of two mixed models are only appropriate for comparisons of nested models, to assess relative goodness of fit of the three dietary indices, t values, AICs and β coefficients from the mixed-effects models for each index were used to quantify the ability of a dietary index to capture each measure. To further assess the ability of dietary indices to capture variance, hierarchical models of alpha diversity were performed with BMI and a smaller subset (n = 2015) incorporating frailty data.Relative abundances of OTUs found in > 25% in individuals were log10 transformed, and residuals were generated via regression against technical covariates of sequencing depth, sequence run, person who extracted the DNA, person who loaded the DNA and sample collection method. OTUs were collapsed to taxonomic abundances and Family and Genus levels. All OTU metrics were used as response variables in mixed-effects models (as above) adjusted for age, twin zygosity, BMI and sex, with FFQ batch as a random effect. Nested models were compared using ANOVA, and p values were false discovery rate (FDR) adjusted using the qvalue package []. Twin pairs discordant by greater than one standard deviation and within different quartiles were identified, and OTU differences between the two were assessed using paired Wilcoxon rank-sum tests and FDR adjustment.Unweighted UniFrac distances were calculated as β diversity measures using the phyloseq package in R []. Ordination plots were also generated using phyloseq, and the first 10 components from the PCoA (representing the first 10 axes) were extracted and used as the response variable in mixed-effects models, as in alpha diversity analysis. Finally, weighted UniFrac distances between twin pairs were used as the response variables in regression models with difference in dietary index, difference in BMI, and differences in factorial technical variables (person who extracted the DNA, person who loaded the DNA and sample collection method) as covariates. Standardised coefficients were calculated using the lm.beta package []. […]

Pipeline specifications

Software tools USEARCH, QIIME, SUMACLUST, PyNAST, phyloseq
Applications Phylogenetics, 16S rRNA-seq analysis
Organisms Homo sapiens