Computational protocol: 16S rRNA gene pyrosequencing of reference and clinical samples and investigation of the temperature stability of microbiome profiles

Similar protocols

Protocol publication

[…] A data analysis workflow based on the Quantitative Insights Into Microbial Ecology (QIIME) pipeline was implemented (Figure  ) []. Pyrosequencing data sff file was demultiplexed with the barcode mismatch tolerance of one base for the 11-base Molecular Identifier (MID) tags. Raw reads were subjected to a quality filtering procedure in the following consecutive steps: terminal trimming to remove N from the 3′-end of the raw reads, removal of reads that are smaller than 200 bases or larger than 1,000 bases, removal of reads that have homopolymer eight bases or longer, removal of reads that contain more than one error in the 16S primer 539R sequence, read trimming to remove primer and linker sequences, sliding window trimming with a window width of 50 bases to remove the terminal sequence within the window with an average quality score below 25. Chimera filtering was performed afterwards using the UCHIME algorithm by either reference-based or de novo method []. Reads that were classified as chimeric by both methods were removed. Finally, singleton reads were excluded from further analysis. For bacterial taxonomic classification, the quality processed reads were subjected to analysis using the QIIME pipeline run by Python programs. The workflow included open-reference clustering of sequences into operational taxonomic units (OTUs) using the UCLUST tool. The sequence identity level was set at 97%, which corresponds to a commonly used bioinformatics definition of the bacterial species based on the 16S rRNA gene. The read clusters were further assigned to taxonomies using the RDP classifier with the confidence level of 80% []. The microbial profiles obtained after this step contained various hierarchical levels of taxonomy classification, and their positions in the taxonomy were used to assess diversity for each community. In the statistical analyses, the reads assigned to taxonomy levels below the genus level were mapped to the corresponding genus level for further evaluation of statistical significance at the genus level. [...] The genus-level microbiome profiles from QIIME/RDP analysis were used to evaluate the microbial community diversity within a sample (α-diversity) and the diversity between samples (β-diversity). Tools for variability analysis in QIIME, including the comparison of abundance of microbial taxa present in the samples, weighted UniFrac measure, and the multidimensional principal coordinate analysis (PCoA), were used []. Two recently proposed methods were collaboratively used for multinomial statistical analysis of the microbiome data. The statistical analysis consisted of three steps: (1) for each microbiome community, use the R statistical software package for HMP (HMP-R) by La Rosa et al. [,] to test the underlying probabilistic model based on the Dirichlet multinomial (DM) distribution and to determine the DM parameters, proportions, and dispersion []; (2) use the HMP-R to perform hypothesis testing of overall significant differences between communities; and (3) use the R software package metagenomeSeq to determine OTUs that are statistically different in the two communities [,]. […]

Pipeline specifications

Software tools QIIME, UCHIME, UCLUST, RDP Classifier, UniFrac, metagenomeSeq
Application 16S rRNA-seq analysis
Organisms Homo sapiens