Computational protocol: Oral Microbiota in Infants Fed a Formula Supplemented with Bovine Milk Fat Globule Membranes   A Randomized Controlled Trial

[…] PCR amplification of the V3-V4 hypervariable region of the bacteria 16S rRNA gene using the forward primer, 341F (CCTACGGGAGGCAGCAG) and the reverse primer, 806R (GGACTACHVGGGTWTCTAAT), and sample library preparation, and Illumina MiSeq sequencing was conducted at the Forsyth Sequencing Facility (HOMINGS, Sequences have been uploaded at doi: 10.6084/m9.figshare.3422962Pair-ended reads were merged using FLASH (, and merged reads and barcodes were matched using the python script Quality filtering of retained sequences was done using Quantitative Insights into Microbial Ecology (QIIME, version 1.8.0). Sequences with a minimum length of 300 base pairs after primer sequence removal, with correct barcode sequences and primer sequences, and not meeting default quality filtering criteria for homopolymers and quality scores in QIIME were retained. Chimeric sequences, as identified by UCLUST, were removed. Retained sequences were clustered into operational taxonomic units (OTUs) at 97% similarity to the Human Oral Microbiome Database (HOMD) ( and taxonomically named by BLAST to the same database for one representative sequence per OTU. The HOMD is a curated database holding 700 named species and taxa identified from 16S rRNA gene sequence analysis of oral isolates and cloning studies. In HOMD, approximately 54% are named species, 14% unnamed (but cultivated) and 32% are known only as uncultivated phylotypes. Each 16S rRNA phylotype is given a unique Human Oral Taxon (HOT) number []. [...] IBM SPSS Statistics (version 22.0; IBM Corporation, Armonk, NY, USA) was used for descriptive analyses and univariate testing of differences and associations. Normally (confirmed by Shapiro-Wilk´s tests) distributed variables were presented as means with 95% confidence intervals and differences between means tested with ANOVA. For non-normally distributed variables medians with range were calculated, and the Kruskal-Wallis test used to test differences between groups. HOT taxa prevalence were highly skewed with >50% of the subjects lacking detection. Therefore, detection frequency (% children) and mean prevalence (% of total number of reads) in the three feeding groups are presented, together with median prevalence among those with detected taxa. Differences in distributions between groups were tested using Chi-square test. For comparisons between HOT taxa a p-value ≤0.008 (accounting for multiple testing by the False discovery rate), and for other variables a p-value <0.05 were considered statistically significant.Rarefaction curves were calculated to compare microbial richness among the three feeding group samples, and principal coordinate analysis (PCoA) to compare the phylogenetic diversity (β diversity) and search for clustering of samples based on the OTU assignment by QIIME. In addition, multivariate principal component analysis (PCA) and partial least square (PLS, SIMCA P+, version 12.0, Umetrics AB, Umeå, Sweden) regression with assigned taxa and potential confounders (mode of delivery, sex, anthropometric measures and lactobacilli by culture) in the independent block were done. These analyses searched for clustering of samples based on taxonomic assignment with addition of potential confounders (PCA) and to identify taxa associated with the EF, SF or BFR groups (PLS regression) as previously described []. For PCA and PLS regression, variables were auto-scaled to unit variance, and cross-validated predictions of Y (here feeding groups) were calculated. Clustering of subjects is displayed in a score-loading plot, and the importance of each x-variable is displayed in a loading plot. Variables with a 95% confidence interval for the PLS correlation coefficient that did not include zero, were considered statistically significant. Besides the explanatory R2-value, PLS regression provides a cross validated predictive value (Q2) of the model. […]

Pipeline specifications

Software tools QIIME, UCLUST, SPSS
Databases HOMD
Applications Miscellaneous, Phylogenetics, 16S rRNA-seq analysis
Organisms Moraxella catarrhalis, Bos taurus, Homo sapiens
Diseases Otitis, Pulmonary Fibrosis