Computational protocol: Bacterial community structure and functional contributions to emergence of health or necrotizing enterocolitis in preterm infants

Similar protocols

Protocol publication

[…] Amplicon libraries targeted the 16S rRNA encoding gene to obtain deep surveys of the microbial communities using the GS-FLX platform and XLR70 sequencing chemistry. PCR primers specific for the V3-V4 region of the 16S rRNA encoding gene (Escherichia coli positions 338 to 802) containing 454-specific adapter sequences as well as an 8-base pair barcode were utilized. This barcode-based primer approach allowed sequencing of multiple samples in a single 454 sequencing run without the need for physical partitioning.Processing of the 16S rRNA-derived sequence inventories was performed using the QIIME toolkit (QIIME 1.5.0) []. Briefly, OTUs were selected at 97% sequence identity using uclust and a representative sequence was then chosen for each OTU based on the most abundant sequence in that OTU. Representative sequences were then aligned using PyNAST, and a taxonomic classification was assigned to representative sequences using the RDP Classifier. These PyNAST-aligned sequences were also used to build a phylogenetic tree with FastTree, and unweighted UniFrac distances were then calculated between all samples. [...] The metagenome data were analyzed using the MG-RAST system [], which provides the ability to annotate with respect to a number of existing databases through use of the M5nr [] - we selected subsystems-based annotations []. Initially, sequence fragments undergo a quality control and de-replication step (exact duplicates being a sequencing artifact). Next, the sequences are screened against multiple functional (protein) and taxonomic (16S rRNA-based) databases via BLAT-based comparison to the MG-RAST M5nr [] for potential protein encoding genes (PEGs) and other features (for example, rRNA-based ones). A phylogenomic-based reconstruction (taxonomic inference from the nearest matching protein) of the sample was then computed by using the taxonomic information associated with the match in the M5nr []. The data are publicly available via the MG-RAST server based at Argonne National Laboratory (, including instant availability of the sequence data, bioinformatic analyses and tools, plus the support for metadata features encoded using minimum information about a (meta)genome sequence (MIGS/MIMS) [].Subsystems count data for each sample metagenome were log transformed and centered to facilitate parametric tests. Principal component analysis (using the variance-covariance matrix, since all of the relative sequence abundance based on the proportion of annotated subsystem was similar for all sampling units) was used to compare condensed subsystems data among samples. These analyses were performed in R using the packages Stats and pcaMethods (; Results were considered significant at α = 0.05 with Bonferroni correction; a stringent adjustment was intentionally chosen to demonstrate the robustness of trends. […]

Pipeline specifications

Software tools QIIME, UCLUST, PyNAST, RDP Classifier, FastTree, UniFrac, BLAT, MIxS
Databases MG-RAST
Applications Miscellaneous, Phylogenetics, Metagenomic sequencing analysis, 16S rRNA-seq analysis
Organisms Bacteria, Homo sapiens
Diseases Intestinal Diseases, Enterocolitis, Necrotizing