Computational protocol: The Microbiome of Aseptically Collected Human Breast Tissue in Benign and Malignant Disease

Similar protocols

Protocol publication

[…] After sequencing, adapter-primer sequences were removed from reads as previously described. In total, 15,156,581 reads passed quality control. Paired-end reads were analyzed according to the pipeline described in the IM-TORNADO bioinformatics pipeline. Taxonomy was assigned against a Greengenes reference database (v13.5) and operational taxonomic units (OTUs) were assigned using a 97% identity threshold. Taxonomic identification was manually checked using BLAST identifying one misclassification that we corrected in the downstream analysis. The sequencing depth of the negative controls and samples, as well as for BBD samples versus invasive cancers, is illustrated in , and a barplot of taxonomic profiles of the negative control buccal and skin swabs at the phylum, family and genus level are shown in . [...] To compare the microbial communities between groups (e.g. different tissue types and disease states), we summarized microbiota data using both α-diversity and β-diversity measures. Two α-diversity metrics were used, the observed OTU number and the Shannon index. The observed OTU number reflects species richness, whereas the Shannon index places more weight on species evenness. β-diversity, by contrast, indicates the shared diversity between bacterial populations in terms of ecological distance; different distance metrics provide distinctive views of community structure. Two β-diversity measures, unweighted and weighted UniFrac distances, were calculated using the OTU table and a phylogenetic tree (with the “GUniFrac” function in the R package GUniFrac). The unweighted UniFrac reflects differences in community membership (i.e., the presence or absence of an OTU), whereas the weighted UniFrac mainly captures differences in abundance. To reduce the potential confounding effect due to uneven sampling, we rarefied the OTU table to a sequencing depth of 20,000 per sample for both diversity analyses. To assess the association with α-diversity, we fitted a linear regression model to the α-diversity metrics after rarefaction, adjusting for technical covariates such as sequencing batch if necessary. A Wald test was used to determine significance. To assess the association between with β-diversity measures, we used the recently proposed MiRKAT, which is a kernel-based association test based on ecological distance matrices. MiRKAT also allows easy adjustment of covariates such as sequencing batch. To further address the potential concern about differential sequencing depth between groups (), we adjusted the sequence depth in the model, in addition to rarefaction. Ordination plots were generated using principal coordinate analysis as implemented in R (“cmdscale” function in the R ‘vegan’ package). [...] PICRUSt was used to infer the abundance of functional categories (KEGG metabolic pathways) based on the 16S rRNA data. Specifically, the input of PICRUSt is an OTU table built by a closed-reference OTU picking strategy, which involves a comparison to an existing reference (Greengenes v13.5). The output of PICRUSt is a count table of functional categories such as KEGG pathways constructed based on the functional content of each OTU. Rarefaction was not performed on the OTU table but singletons were removed before PICRUSt prediction. The predicted functional count table was normalized into relative abundances and differential abundance analysis was performed using the same permutation test that was used for the taxon analysis. Batch effects were adjusted in the model. We reported differential KEGG pathway with unadjusted P < 0.05, and differential abundance analysis was performed using the same permutation test that was used for the taxon analysis. All statistical analyses were performed in R 3.0.2 (R Development Core Team, Vienna, Austria). […]

Pipeline specifications

Software tools IM-TORNADO, MiRKAT, PICRUSt
Databases KEGG KEGG PATHWAY Greengenes
Applications Phylogenetics, Metagenomic sequencing analysis, 16S rRNA-seq analysis
Organisms Homo sapiens
Diseases Breast Diseases, Breast Neoplasms, Neoplasms