Computational protocol: Bacterial diversity in Buruli ulcer skin lesions: Challenges in the clinical microbiome analysis of a skin disease

Similar protocols

Protocol publication

[…] Before processing Illumina sequencing data (See ), we did an overall quality assessment of the raw reads using FastQC v.0.11.3. [] The sequence read pairs were then combined using their overlapping portions, trimmed to remove very short (<45 bp) and very long (>610 bp) sequences and screened to remove reads with ambiguous nucleotide calls using the MOTHUR v.1.34.1 software [] according to the standard operating procedure for the MiSeq system. [] From these reads we removed the remaining human DNA by classifying them against a database of the human genome (GRcH38) with the Kraken tool (v. 0.10.5). [] Chimeras were removed from our dataset in QIIME v.1.9.0 [] by comparing them against the chimera-checked Greengenes 16S rRNA sequence database (v. 13.5) [] using Usearch version 6.1. [] Afterwards these reads were combined with the pre-processed reads from healthy swabs from the public studies. In order to remove other PCR artefacts and to keep reads of the right marker gene, we applied a prefiltering step at a cut off level of less than 60% similarity before picking operational taxonomic units (OTUs). Open-reference OTU picking was then performed consisting of a closed-reference OTU picking step that clusters the sequences to the Greengenes database at 97% similarity with UCLUST [] and a de novo OTU picking step which clusters the unclassified sequences to each other at 97% similarity. [] As a last step, the singletons were removed so that only OTUs with a minimum of two sequences were kept in the OTU table.Next, we attempted to identify and remove potential contaminating OTUs deriving from reagents used during DNA extraction and 16S rRNA library preparation. For that purpose, we applied the method of Jervis-Bardy et al. [] in which the relative abundances of the OTUs were correlated with the amplicon concentrations of the samples after library preparation in R v.3.0.0. [] A significant inverse Spearman correlation would denote a contaminating OTU that needed to be filtered from our OTU table.After contaminant removal, the OTU table was rarefied so that all samples were brought to a same sequencing depth of 1000 sequences. Second, since the 16S gene is present in multiple copies within some bacterial genomes, this variation in 16S copy number results in inflated counts for those species with high copy numbers. [] We accounted for this bias by dividing the OTU counts by the predicted 16S copy number abundance of the associated species using PICRUSt (Online Galaxy version 1.0.0). [] This generated a normalized OTU table with all OTUs still present, but with correct count numbers.The α- and β-diversity measurements were undertaken using QIIME. Here we analyzed the bacterial diversity within the three groups (α-diversity) and compared the diversity between the three groups (β-diversity). For the α-diversity, we calculated the OTU richness using the Chao1 index, while the overall diversity (evenness) was measured with the Shannon Index and the Simpson’s Index. [–] For the β-diversity, we applied a non-phylogenetic based method, named the Bray-Curtis Index. [] To measure the reliability of the estimates, we applied the jackknifing technique, in which we subsampled and calculated the Bray-Curtis index 100 times. These Bray-Curtis dissimilarities were used for a Principal Coordinates Analysis (PCoA).To look for significant differences in α-diversity between BU, non-BU and healthy lesions a non-parametric two-sample t-test via Monte-Carlo permutation with Bonferroni multiple test correction (P≤0.05) was employed in QIIME. Statistical comparison of the metagenomes of the samples to distinguish ecological influences such as BU/non-BU disease was done using STAMP. [] With this tool, taxonomic and compositional differences can be assessed between BU samples, non-BU samples and between the healthy microbiome and BU/non-BU microbiome, by looking at the abundance of metagenomic sequences. All two-way comparisons were done using the non-parametric Welch’s t-test with Benjamini-Hochberg FDR multiple test correction (P≤0.05). Reported p-values are those corrected for multiple testing. […]

Pipeline specifications

Software tools FastQC, mothur, Kraken, QIIME, USEARCH, UCLUST, PICRUSt, STAMP
Databases Greengenes
Applications Phylogenetics, Metagenomic sequencing analysis, 16S rRNA-seq analysis
Organisms Mycobacterium ulcerans, Homo sapiens
Diseases Communicable Diseases, Skin Diseases, Skin Ulcer, Mastocytosis, Systemic