Computational protocol: Characterization of the Active Microbiotas Associated with Honey Bees Reveals Healthier and Broader Communities when Colonies are Genetically Diverse

Similar protocols

Protocol publication

[…] FASTA-formatted sequences and corresponding quality scores were extracted from the .sff data file generated by the pyrosequencer using the GS Amplicon software package (Roche, Branford, CT). All data pre-processing, analysis of operational taxonomic units (OTUs), phylotype analysis and hypothesis testing were performed using modules implemented in the Mothur software platform . Pooled sequences were binned according to the colony from which they were derived using the unique barcodes on the primers (these were removed prior to downstream analyses). Primer regions were also removed from the sequences at this point. Sequence length and quality were evaluated for each read; sequences were culled if the length was <300 bp and >500 bp, the average SFF quality score was <30, they contained any ambiguous base calls, or did not match any of the primers or barcode colony identifiers. The data set was simplified by using the “unique.seqs” command to generate a non-redundant (unique) set of sequences. Unique sequences were aligned using the “align.seqs” command and an adaptation of the Bacterial SILVA SEED database as a template (available at: http://www.mothur.org/wiki/Alignment_database). To ensure that we were analyzing comparable regions of the 16S rRNA gene across all reads, sequences that started before the 2.5-percentile or ended after the 97.5-percentile in the alignment were filtered. Sequences were denoised using the “pre.cluster” command. This command applies a pseudo-single linkage algorithm with the goal of removing sequences that are likely due to pyrosequencing errors . A total of 2,154 potentially chimeric sequences were detected and removed using the “chimera.slayer” command . Aligned sequences were clustered into OTUs (defined by 97% similarity) using the average neighbour algorithm. Rarefaction curves were plotted for each sample and a weighted UniFrac dendrogram was generated using the UniFrac module implemented in Mothur. The UniFrac algorithm assigned a distance between different microbial communities based on the composition of lineages that were found in each sample. Importantly, UniFrac takes into account the phylogenetic relatedness of lineages in each sample. All community diversity parameters (Shannon-Weaver, Chao1, and Simpson's) were calculated as described in the Mothur software manual. Sequences were taxonomically classified by the RDP-II Naive Bayesian Classifier using a 60% confidence threshold. Sequences that could not be classified to at least the kingdom level were excluded from subsequent diversity analyses. Venn diagrams and heatmap figures were generated using custom Perl scripts. Pyrosequence data sets are available through the EBI/DDBJ Sequence Read Archive accession number DRA000526. Based on these procedures, we use the term “species” throughout to refer to operational taxonomic units (OTUs) at a 97% sequence-identity threshold. [...] Pearson correlations and Mann Whitney U-tests utilized the classification data generated through the Mothur pipeline (described above) and were run in the statistical package SPSS. Bootstrap analyses (5,000 runs per analysis) were also based on classification data and means, standard deviations from the mean differences, as confidence intervals were run for 5,000 replicates using an in-house perl script. The bootstrap analysis was performed such that a randomly selected 10 of the 12 genetically diverse colonies were compared to the 10 genetically uniform colonies. For each sampling, the difference between colony types in total number of species as well as number of sequences affiliating with known pathogens or Bifidobacterium were calculated. 95% confidence intervals (CI) around mean difference values were calculated and the null hypothesis that there was no effect of increased within-colony diversity was rejected if zero was not included in the CI. […]

Pipeline specifications

Software tools mothur, UniFrac, SPSS
Applications Miscellaneous, Phylogenetics, 16S rRNA-seq analysis
Organisms Apis mellifera, Homo sapiens