Computational protocol: Skin fungal community and its correlation with bacterial community of urban Chinese individuals

Similar protocols

Protocol publication

[…] Following sequencing, de-multiplexing and barcode removal were performed. A total of 7,226,928 reads from each of forward and reverse in .fastq format were overlapped and merged using FLASH [], based on a maximum overlap of 250 bp. Quality filtering of paired-end reads was performed using the “fastq_filter” command in USEARCH [], based on a maximum expected error of 1 error/read, reads trimmed to a uniform length of 275 bp and reads shorter than 275 bp removed, resulting in 6,297,874 reads passing quality control. These reads were clustered into operational taxonomic units (OTUs) based on 97 % identity using UPARSE with the “cluster_otu” command in USEARCH []. Reference-based chimera detection was performed using the “uchime_ref” command in USEARCH [], with the recently released UNITE/INSDC representative sequence set (11 March 2015 version) [] as reference. Taxonomic information was provided for each OTU with the “assign_taxonomy.py” QIIME script using default parameters. Both the commonly used UNITE fungal reference database [] (2 March 2015 version, 55,404 sequences included) and the recently curated fungal reference database (23,456 sequences included) constructed previously by Findley et al. [] were used to compare the coverage and accuracy of OTU taxonomic classification. Subsequent analyses involving taxonomic data were based on results derived from the curated database, as it provided a higher percentage of taxonomically assigned reads (see “” section). OTU lineages present in more than 5 % of the reads in negative controls were deemed possible contaminants and were removed from all samples. Following quality control and chimera and contaminant read removal, a total of 4,124,756 reads were retained for downstream community analysis. OTU, read count, and taxonomic information (based on Findley’s reference database) is provided in Additional file : Table S1.For the comparison of reference databases on taxonomic coverage of other populations, data from two American studies were selected [, ]. Although the studies employed different primers, they all target the ITS1 region. The two studies include one that characterizes multiple body sites among ten asymptomatic adults (with unknown ethnicity) based in the Washington D.C. (Bethesda) area []. Only forehead, forearm, and palm sites were selected from this study for the comparative analysis. The study was selected as it is one of the few large-scale skin mycobiome works present. The other study examines the forehead mycobiomes of healthy occupants within a residence in Berkeley, California, as part of a built environment (BE) microbiome investigation []. Each skin sample collected in this study was an integrated sample that was pooled from each cohabiting occupant within a household. This study was selected for comparative analysis as skin samples were collected from occupants within their predominant BE habitat, mirroring that of the HK study. Following raw data acquisition, sequences followed the read quality control, OTU clustering, and taxonomic classification procedures as described above.HK samples were normalized by random subsampling to a read depth of 1175 reads/sample using QIIME script “multiple_rarefactions.py” []. Fungal community richness (observed OTUs and Chao1 total OTU estimator), diversity (Shannon and Simpson), community membership (Jaccard distance, JD), and composition (Bray-Curtis dissimilarity, BCD) computations were performed using normalized data using QIIME scripts “alpha_diversity.py” and “beta_diversity.py” with default settings. Sixteen samples were removed from normalized analyses (all forearm sites of different households or individuals), as these samples had lower than 1175 reads. Good’s coverage of over 96 % for all remaining samples indicates sufficient normalized depth in capturing the sample microbial diversity. Ten rounds of random rarefactions were performed for each sample at this read depth. ANOSIM values based on BCD and JD were constructed in R package vegan (http://vegan.r-forge.r-project.org/). UniFrac distance was not used on fungal data, as the ITS1 region employed for sequencing is highly variable for informative and meaningful phylogenetic analyses []. [...] The nonparametric Mann-Whitney (MW) and Kruskal-Wallis (KW) tests were employed to determine significance when comparing between two or more comparison groups, respectively. Where indicated, post hoc KW pairwise comparison tests for significance between individual groups were performed using the “kruskalmc” function in the R package pgirmess (http://cran.r-project.org/web/packages/pgirmess/index.html) following significant KW observations. For cross-domain α- and β-diversity correlations, the Spearman’s correlation and linear regression fit were computed in R (http://www.r-project.org). In order to determine the correlation between fungal and bacterial community richness, α- (observed OTUs, Chao1 total OTU estimation, and Faith’s phylogenetic diversity) diversity data was correlated between the two domains. For cross-domain β-diversity correlations, fungal BCD values between pairwise samples were correlated with both bacterial BCD and weighted UniFrac distances. Bacterial community diversity data was based on previous study [], targeting the 16S V4 region (primers 515fw: 5′-GTG CCA GCM GCC GCG GTA A-3′; 806rv: 5′-GGA CTA CHV GGG TWT CTA AT-3′) [, ], selected to capture greater bacterial diversity compared to other hypervariable regions []. Spearman tests and linear regressions of cross-domain α- and β-diversity correlations were performed with R. Sparse correlations for compositional data (SparCC) was computed between all quality-filtered reads from bacterial and fungal data to detect co-abundance and co-exclusion correlations []. SparCC was chosen as it addresses the compositional bias introduced when correlating relative abundance data. SparCC analysis, network plots, and two-sided pseudo p values (p values ≤0.05 considered significant) based on 100 repetitions were computed on python scripts as described []. […]

Pipeline specifications

Software tools USEARCH, UPARSE, UCHIME, QIIME, vegan, UniFrac, SparCC
Applications Phylogenetics, 16S rRNA-seq analysis
Organisms Homo sapiens