Computational protocol: Phylogenetic stratigraphy in the Guerrero Negro hypersaline microbial mat

Similar protocols

Protocol publication

[…] Sanger sequencing was performed at the Joint Genome Institute (under the Community Sequencing Program) using standard protocols for the MegaBace4000 capillary electrophoresis platform. Sequences were assembled by PHRAP using XplorSeq (). 454 amplicon production was performed as previously described using barcoded 27F/338R primers targeting the V1/2 region of the bacterial 16S rRNA gene (). Sequencing was performed per manufacturer's protocols using the 454 FLX system (Branford, CT, USA). [...] All data analysis was performed using the Quantitative Insights Into Microbial Ecology software package, version 1.3.0-dev (QIIME) (). Sanger reads were required to be ⩾1250 and <1550 bases in length to be included in the analysis, and 454 reads were required to be ⩾200 bases and <500 bases in length to be included in the analysis. Any reads containing N characters were discarded.Operational taxonomic unit (OTU) picking was performed using an ‘open-reference' OTU picking protocol, where sequences are clustered against the Greengenes database () pre-clustered at 97% identity using uclust (). Reads that matched a Greengenes reference sequence at ⩾97% identity were clustered to an OTU defined by that reference sequence. Reads that did not match a reference sequence were subsequently clustered de novo. OTU picking (and all subsequent steps) were performed independently for both 454 and Sanger reads. OTUs observed in only one of the two replicate PCR and sequencing runs for each sample were discarded.The centroid sequence in each cluster was chosen to represent the cluster and aligned with the Greengenes core set using PyNAST (). Chimeric sequences, identified with ChimeraSlayer () (run independently for the 454 and Sanger data sets), and reads that failed to align with PyNAST were excluded from subsequent analyses. PyNAST version 1.1 was used through QIIME using default parameters. Phylogenetic trees were constructed using FastTree (), and taxonomic assignments were made for each representative sequences using tax2tree (). Taxonomies were additionally assigned using the RDP Classifier () retrained on the Greengenes database. RDP taxonomy assignments were largely consistent with the tax2tree taxonomy assignments, although fewer sequences were unassigned with tax2tree (data not shown).Beta diversity between 16S rRNA samples was computed using unweighted UniFrac (). Metagenomic data were obtained from MG-RAST (), and distances between metagenomes were computed as Bray–Curtis distances using QIIME based on counts of all metagenome reads that were classified in the Kyoto Encyclopedia of Genes and Genomes. Principal coordinates (PCoA) plots were generated independently for the 454, Sanger, and metagenome data and compared using Procrustes analysis (), using as input the distance matrix of unweighted UniFrac distances for 16S rRNA and the distance matrix of Bray–Curtis distances for KO groups from the Kyoto Encyclopedia of Genes and Genomes (as in ). Phylogenetic and OTU gain were computed using the unifrac_g and binary_otu_gain metrics in QIIME.Detrending of PCoA plots was performed by fitting PC2 values to a quadratic function of PC1, projecting each point onto the fitted curve, and reporting the positions of the points along the curve as the transformed PC1 values, and the residuals as the transformed PC2 values. This process was iterated until the average change in PC1 value was>0.001% of the range (five iterations). This novel but simple method allows detrending of even extreme curvature exhibiting a ‘horseshoe' effect while generally preserving inter-sample distances, in contrast to well-known but controversial previous approaches to detrending ordination coordinates (; ; ). To justify our approach, we show that the analysis preserves the correlation of PC1 with the expected dominant gradient, sample depth (Spearman's rank correlation coefficient=0.976, P<1 × 10−23), while removing most of the correlation between PC2 and depth (mean Spearman's rank correlation coefficient before=0.9511, P<1 × 10−8, after=0.2579, P>0.05; correlation is calculated separately below and above the numerical midpoint of the gradient, then averaged). We also show that this approach generally preserves the original ordination distances (Spearman's rank correlation coefficient=0.9019 (P<1 × 10−230). […]

Pipeline specifications