Computational protocol: Size Matters: Assessing Optimum Soil Sample Size for Fungal and Bacterial Community Structure Analyses Using High Throughput Sequencing of rRNA Gene Amplicons

Similar protocols

Protocol publication

[…] Raw 28S rRNA gene sequences were processed for minimum length (400 bp), quality (Q > 20), primer match and barcode sorting using the RDP pyrosequencing pipeline. Chimeras were identified and removed using UCHIME () in de-novo mode and the remaining sequences were randomly re-sampled to 4,300 sequences per sample using MOTHUR (). Three samples were discarded that did not meet the minimum resampling depth. The remaining 266,600 sequences were aligned then clustered at 5% nucleotide dissimilarity and representative sequences generated for each OTU using RDP tools hosted on the Michigan State University High Performance Computing Center servers. The RDP Fungal Classifier based on training set 11 was used for classification of each cluster representative sequence.Bacterial 16S rRNA gene amplicons were sequenced on the Illumina MiSeq platform (2 bp × 250 bp paired end reads). Raw reads were assembled using a modified PandaSeq () with a minimum overlap of 50 bp, minimum and maximum lengths of 220 and 280, respectively, and a minimum Q score of 28 as determined by defined community analysis using RDP tools (). All computation was performed on the MSU High Performance Computing Center servers. UCHIME () was used to identify and remove chimeras followed by resampling at 23,000 sequences per sample using MOTHUR (), alignment then clustering at 3% nucleotide dissimilarity. Representative sequences were classified using the RDP Classifier with training set 9 at 80% confidence.Raw cluster abundances were Hellinger transformed and a Bray-Curtis dissimilarity matrix (+1) was constructed, statistical analyses performed and diversity estimates calculated using PRIMER-E (). Statistical analyses were based on four replicates from each of the four field GPS locations (n = 16), except for SARDI 10 g and 100 g extractions (n = 8). Cluster analysis was performed with the Similarity Profile analysis (SIMPROF) test (). Significant differences in community structure were tested using Permutational Multivariate Analysis of Variance (PERMANOVA) () and Analysis of Similarity (ANOSIM) (). Sample replicate dispersion was tested by Permutational Analysis of Multivariate Dispersions (PERMDISP) () and a test for Multivariate Dispersion (MVDISP). ANOVA statistics for Shannon diversity (H’), Pielou’s Evenness (J), Margalef’s Richness (d) and the number of individuals (N) were performed using Minitab 16 (Minitab Inc, USA). Sequences were deposited in the European Nucleotide Archive under study PRJEB8081 with accession numbers ERS632772–632841 and ERS671660–ERS671724. […]

Pipeline specifications

Software tools UCHIME, mothur, PANDAseq, RDP Classifier
Application 16S rRNA-seq analysis
Diseases Pulmonary Fibrosis