Computational protocol: Amplicon-based metagenomics identified candidate organisms in soils that caused yield decline in strawberry

Similar protocols

Protocol publication

[…] Raw sequences were automatically de-multiplexed by the Illumina MiSeq and then further quality-filtered by the QIIME analysis pipeline: (1) removing primers from sequences, (2) removing low-quality reads, (3) identifying an OTU for each sequence against two international databases: 16S (bacteria) – Silva and the UNITE fungal 18S ITS database at 97% similarity, (4) storing every unique sequence and its frequency in each sample. Finally, we wrote a small utility programme in Delphi to (1) produce summary for OTU frequency for each sample, (2) merge the OTU frequency data over all samples and (3) produce an overall OTU table in the BIOM format, enabling further downstream statistical analysis.We also customised the UNITE fungal ITS database to include ITS sequences for oomycetes and vascular wilts; all oomycete ITS sequences were first obtained from international repositories – if multiple sequences were available for a single species then a consensus sequence was generated with Geneious version 6.1 (Biomatters Ltd). Custom Perl scripts were written to query the Index Fungorum database (http://www.indexfungorum.org/), using the Index Fungorum Fungus API to query the website to return taxonomic information on the additional species, absent from the UNITE database. Both the consensus sequences and the linked taxonomic information were then appended to the UNITE database files. Full scripts are available for download from (https://github.com/eastmallingresearch/metagenomics). [...] Two types of statistical analyses were performed: (1) initial exploratory analysis and (2) detecting differential OTU abundance between samples from different sites to identify candidates responsible for the yield-decline phenomenon.Individual sample diversity (i.e., α diversity) was calculated: the number of distinct OTUs observed per sample (Sobs) and Shannon and Simpson indices (both related to the frequency of individual OTUs within a sample). To reduce biases in sequencing depth on these indices, a re-sampling (i.e. bootstrap) scheme was used to estimate α diversity indices for each sample. A bootstrap sample was obtained via randomly sampling a minimum number of sequences from the sequences in each sample (i.e., rarefying). All indices are calculated from each bootstrap sample at the rarefaction point – a total of 25 bootstraps were conducted. Mean indices were calculated from these bootstrap samples. Next, we calculated diversity indices among samples (i.e. β diversity): Morrison-Horn (MH), Bray-Curtis (BC) and ThetaYC (YC) indices. MH measures the similarity between two samples whereas the other two methods measure the dissimilarity between two samples. Both α and β diversity indices were calculated using the Win64 version of the Explicet software. A principal component analysis was conducted to detect overall differences between samples using the STAMP programme.To assess differential OTU abundance among treatments, we used the DESeq2 statistical package in R. DESeq2 was developed for comparing differential gene expressions but is equally applicable to analysis of metagenomic data. DESeq2 provides a new statistical fitting routine to account for variance heterogeneity often observed in sequence data; it uses the negative binomial distribution as an error distribution to compare abundance of each OTU between groups of samples in the framework of generalised linear modelling. This method was superior to other methods commonly used for this purpose; the same study also showed that rarefying samples is inferior to not rarefying in identifying differences in OTU abundances but with correct distribution assumptions. Thus, in the present study, rarefying was not used when comparing OTU abundance. The median-of-ratios method was used to normalise the data to correct for unequal sequencing depth; this procedure was implemented as a default in DESeq2. To correct false discovery rate associated with multiple testing, DESeq2 uses the Benjamini-Hochberg (BH) adjustment. In addition, DESeq2 also implemented an algorithm to adjust for large variability in log-fold changes for small counts. Candidate OTUs were selected at the 5% significance level (BH adjusted). Any OTU with the total number of reads across all the samples less than three was omitted from differential abundance testing.In comparing OTU abundance, seven different comparisons were made, taking into account the potentially complex nature of candidate organisms, and the persistency of chloropicrin effect in relation to sampling time. The first five comparisons were between sites with and without the yield-decline phenomenon; only samples collected in 2014 were used for these five comparisons since they were sampled at the same time. In these comparisons, chloropicrin-treated soil samples from HB12 and PV12 were excluded because it is not certain whether the chloropicrin effect could last for more than two years. These five comparisons were: EM13/HB13 (12 samples) vs. HB12/PV12 (six samples);EM13 (six samples) vs. HB12 (three samples);EM13 (six samples) vs. PV12 (three samples);HB13 (six samples) vs. HB12 (three samples);HB13 (six samples) vs. PV12 (six samples).The four individual comparisons (b–e) are necessary since it is possible that yield decline may be caused by more than one organism; thus an overall comparison, as in (1), may not be able to reveal all possible candidates because of potential site-to-site variability in the abundance of candidate organisms. The final two comparisons were:2013 chloropicrin (three samples) vs. other treatments (nine samples) at PV12;2014 chloropicrin (three samples) vs. the untreated (three samples) at HB12. EM13/HB13 (12 samples) vs. HB12/PV12 (six samples);EM13 (six samples) vs. HB12 (three samples);EM13 (six samples) vs. PV12 (three samples);HB13 (six samples) vs. HB12 (three samples);HB13 (six samples) vs. PV12 (six samples).The four individual comparisons (b–e) are necessary since it is possible that yield decline may be caused by more than one organism; thus an overall comparison, as in (1), may not be able to reveal all possible candidates because of potential site-to-site variability in the abundance of candidate organisms. The final two comparisons were:2013 chloropicrin (three samples) vs. other treatments (nine samples) at PV12;2014 chloropicrin (three samples) vs. the untreated (three samples) at HB12.For comparison (f), only 2013 samples were used because the sampling time elapsed was close to two years after the application of treatments at PV12, similar to the time elapsed for the 2014 samples at HB12. […]

Pipeline specifications

Software tools QIIME, Geneious, Explicet, STAMP, DESeq2
Application Metagenomic sequencing analysis
Organisms Verticillium dahliae
Diseases Mycoses