Computational protocol: Identification of Household Bacterial Community and Analysis of Species Shared with Human Microbiome

Similar protocols

Protocol publication

[…] Raw sequence files were processed by (1) demultiplexing, (2) trimming primer sequence, (3) quality filtering, (4) sequencing error correction, (5) taxonomic assignment, and (6) detection of chimeras. Each sample was identified by a unique barcode in the demultiplexing step and low quality reads (average quality score <25 or read length <300 bp) were removed for further analysis. Pairwise sequence alignment and the hmm-search program of the HMMER 3.0 package [] were used to trim primer sequences based on the profile of the 16S rRNA V1−V3 regions. To correct sequencing errors, representative sequences in clusters of trimmed sequences were chosen and considered for taxonomy identification (details in Supplementary Methods). Individual reads were assigned their taxonomic positions according to the highest pairwise similarity among the top five BLASTN hits against the EzTaxon-e database []. Chimera sequences were removed by UCHIME []. The read number in each sample was normalized by random subsampling. The diversity indices and species richness were calculated using three different methods: Cluster Database at High Identity with Tolerance (CD-HIT), Taxonomy-Based Clustering (TBC), and Taxonomy-Dependent Clustering (TDC)-TBC (details in Supplementary Methods). The compositions and proportions of bacterial species shared between two samples or sets of multiple samples were calculated using CLcommunity software (ChunLab, Inc., Korea). Similarity coefficients of Bray-Curtis, Jaccard and Sorenson abundance were calculated using Mothur [], and the matrix of Fast UniFrac [] was generated using CLcommunity. Principal coordinate analyses (PCoA) were used to represent the relationships between samples using calculated similarity coefficients. The significance of difference among bacterial communities was calculated by Libshuff analysis using Mothur. Pyrosequencing reads generated in this study are available at the EMBL SRA database under the study accession number ERP002164 ( […]

Pipeline specifications

Software tools HMMER, BLASTN, UCHIME, CD-HIT, mothur, Fast Unifrac
Databases EzBioCloud
Application 16S rRNA-seq analysis
Organisms Homo sapiens, Cutibacterium acnes, Bacteroides vulgatus
Diseases Foodborne Diseases, Staphylococcal Infections