Computational protocol: Intestinal microbiota of preterm infants differ over time and between hospitals

Similar protocols

Protocol publication

[…] If stored in thioglycollate, stool was thawed and centrifuged for 10 min at 4,000 × g and the supernatant was removed. For all samples, 100 μL of TE buffer with lysozyme and proteinase K was added to 0.24 g of thawed stool and vortexed for 10 min. An amount of 1.2 mL of buffer RLT with beta-mercaptoethanol was added to the sample and transferred to sterile bead beating tubes containing 0.3 g of 0.1-mm glass beads. Samples were homogenized for 3 min in a bead beater and centrifuged at 4,000 × g for 5 min to pellet debris. Supernatant was transferred to a clean microcentrifuge tube and spun at 4,000 × g for an additional 2 min to remove remaining debris. Supernatant was then transferred to a Qiagen AllPrep DNA spin column, and DNA was isolated using the Qiagen AllPrep DNA/RNA mini kit (Qiagen, Valencia, CA, USA).Using extracted DNA, 180-nt paired-end reads were generated using established primers and protocols, with samples allocated across multiple Illumina MiSeq (Illumina, San Diego, CA, USA) runs []. Read pairs were merged to create amplicon-spanning sequences that were then filtered to remove those with less than 70% identity to any read in the rRNA16S.gold.fasta reference set (http://drive5.com/uchime/uchime_download.html) using “usearch -usearch_global -id 0.70.” Using the UPARSE pipeline [], software version usearch7.0.959_i86linux64, 79,076,883 sequences were processed. The following commands were used with default settings unless otherwise specified. Dereplication resulted in 35,605,130 sequences (-derep_fulllength); removal of singleton reads in 2,206,563 sequences (-sortbysize -minsize 2) and clustering yielded 7,249 OTU representative sequences (-cluster_otus). The OTU table was constructed by mapping reads to OTUs (-usearch_global -strand plus -id 0.97) and applying the python script uc2otutab.py (http://drive5.com/python/). Additional chimera filtering was not applied. QIIME [] version 1.6 was used to provide classifications of the OTU representative sequences using the gg_13_5 GreenGenes taxonomy and representative sequences constructed at 99% similarity. A phylogenetic tree was constructed within the QIIME package using FastTree and filtered PyNAST alignments of the OTU representative sequences. OTUs with a minimum count fraction of 0.0002 were removed from the OTU table in QIIME, resulting in 525 unique OTUs. [...] Comparisons between years and hospitals were restricted to week 1 and 2 samples, as hospital 2 had only a single sample from week 3 in 2011.Differences in clinical characteristics among groups by week were tested using Fisher’s exact test for categorical variables and t test for continuous variables. To standardize comparisons of microbiota, we rarefied the OTU table to 2,000 reads per sample. Rarefaction randomly selects reads from the complete set obtained for each sample until the specified number of reads is obtained. This means that each sample has an equal chance of including rare OTUs so that the samples can be compared.Alpha diversity was calculated for weeks 1 and 2 using two metrics: Simpson Diversity Index (1-D) and Chao1. Kruskal-Wallis (KW) was used to test for differences in alpha diversity by year and hospital. To examine beta diversity, we used non-metric dimensional scaling (NMDS) to ordinate the microbial communities based on both the unweighted and weighted UniFrac distance calculated in QIIME as described in Morrow et al. []. The unweighted UniFrac examines presence/absence only while the weighted UniFrac accounts for abundance differences.Significant differences in specific taxa between hospitals and by year were identified by linear discriminant analysis effect size (LEfSe) []. GEE models were used assuming linear and logistic relationships to test the association of taxa identified by LEfSe with hospitals after adjustment with other potential confounders. A backwards elimination approach was used to remove non-significant covariates from the models. Samples from the same infant in different weeks were included in the same model.Differences in degree of succession between hospitals were tested using the Jaccard index. Values for the Jaccard index were calculated between the weeks 1 and 3 samples from 28 infants who had samples collected across the first 3 weeks of life. The Jaccard index is a distance metric used to show similarity over time. Identical communities will have a Jaccard index value of 0 while completely non-overlapping communities will have a Jaccard index value of 1. KW was used to test for differences in the intra-subject Jaccard index values by hospital and year. […]

Pipeline specifications

Software tools UCHIME, USEARCH, UPARSE, QIIME, FastTree, PyNAST, UniFrac, LEfSe
Databases Greengenes
Applications Metagenomic sequencing analysis, 16S rRNA-seq analysis
Organisms Homo sapiens
Diseases Enterocolitis, Necrotizing