Computational protocol: High reactivity of deep biota under anthropogenic CO2 injection into basalt

Similar protocols

Protocol publication

[…] Only high-quality partial sequences (700–800 bp) were retained for subsequent analyses. We discarded sequences of poor quality or potential chimeras. Preliminary distance (neighbor-joining) trees allowed the identification of groups of highly similar sequences (>97% identity) or phylotypes. One representative clone of each phylotype was used for taxonomic affiliation using the SINA software. Partial sequences were compared to those in databases by Basic Local Alignment Search Tool (BLAST). [...] A total 124 high-quality partial sequences (700–800 bp) were obtained from the 10 16S-rRNA gene libraries and selected for subsequent phylogenetic analyses. Preliminary distance (neighbor-joining) trees allowed for the identification of groups of highly similar sequences (>97% identity) or phylotypes. Several representative clones of the different OTUs were nearly fully sequenced. Complete sequences were assembled using CodonCode Aligner ( prior to phylogenetic analyses. Potential chimeric sequences were identified manually by comparing several portions of the full-length environmental 16S-rRNA gene sequences with sequences of the GenBank database using BLAST in addition to UCHIME and DECIPHER. Sequences were aligned with the ARB software and then added into the reference tree using the Parsimony tool. Sequences from the clone libraries, along with closely related environmental clones, closest cultivated members and some representative sequences of the major taxa, were selected for phylogenetic tree construction. The resulting sequences were used as input to build phylogenetic trees by maximum likelihood with RAxML using a general time reversible model of sequence evolution, and taking among-site rate variation into account by using a four substitution rate category and an estimated Γ distribution. Base frequency and proportion of invariable sites were empirically estimated. ML bootstrap proportions were inferred using 1,000 replicates. [...] Pyrosequence data were processed using the QIIME pipeline following standard practice and algorithms incorporated into QIIME. Reads were quality filtered by removing reads with unresolved bases and/or anomalous read length. Sample barcode and primer sequences were trimmed from the proximal end of the sequences without any truncation of the sequences on the 3′ end, as recommended for proper use of Denoiser. Reads shorter than 300 nucleotides or containing one or more ambiguities were removed. This process was done twice in order to apply Denoiser on both the forward and the reverse set of sequences. Reverse primer, barcode and the following sequence were then removed on both sets. Reverse sequences were reversed before concatenating the two databases. OTUs were picked using Uclust. Alignment was performed with PyNAST with the SILVA 111 reference database before chimera checking using ChimeraSlayer. Sequences were assigned taxonomy at 97% similarity using the RDP classifier 2.2 and the SILVA 111 reference database, with a confidence >50%. Alignment was then filtered with an entropy threshold of 10% and positions with more than 80% gap were removed. Tree was built with FastTree 2.1.3 for subsequent analyses. Singletons and sequences non-affiliated to an OTU were removed.Unifrac was used for Principal Coordinates Analysis (PCoA) following standard practice adapted to the samples, with equalized sampling depth from rarefaction analyses removing sample heterogeneity. Beta-diversity analysis was performed using jackknife replicates in order to estimate the uncertainty in PCoA plots and hierarchical clustering of microbial communities. Beta significance test was done with 1,000 permutations on the overall data set and for each well separately, and indicated that the weighted beta-diversity metrics were not significant (p-value » 0.01 for each pair of environment—data not shown). In accordance, unweighted beta-diversity metrics were only considered here.Canonical correlation analysis was performed to investigate relationships between OTU relative abundances and environmental parameters in both control well HN-01 and monitoring well HN-04. PERMANOVA tests were previously applied on each variables with adonis function of R-vegan package. It allowed selecting significantly correlated parameters (i.e., p-value < 0.05). Correlations between the qualitative Unifrac distance matrix and the corresponding geochemical metadata were explored using Mantel test with 9,999 permutations, based on Pearson’s product moment. Pearson correlation coefficients (r-values) were evaluated to test linear correlations between the geochemical data and the abundance of taxonomic groups. All statistical analyses were performed using a paired-end reads OTU table normalized to the lowest sampling depth and computed on R-Studio v.3.3.2 using vegan v.2.4-2, and ggplot2 v.2.2.1 packages. [...] Among the series of functional genes coding for inorganic carbon assimilation and organic carbon degradation that were studied by selected amplifications (listed in the caption of Supplementary Table ), only three led to successful amplifications: first, cbbL and cbbM, respectively encoding form I and form II ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO), a key enzyme for autotrophic CO2 fixation; second, with two different sets of primers (pheU and PHE), the genes coding for the largest subunit of multicomponent phenol hydroxylases (LmPHs) involved in the degradation of phenolic compounds. PCR primers and thermal conditions are described in Supplementary Table . Negative controls were carried out systematically for each PCR amplification; all were negative. Amplicons were cloned into TOPO TA cloning kit (Invitrogen Life Technologies) according to the manufacturer instructions. After plating, positive transformants were screened by PCR amplification of inserts using M13r and T7 flanking vector primers. Inserts of the expected size were sequenced by GATC Biotech AG (Konstanz, Germany). A total of 21, 2, 7, and 6 high-quality partial sequences were, respectively, obtained for cbbL, cbbM, pheU, and PHE primers and retained for subsequent phylogenetic analyses. Potential chimeric sequences were identified manually by comparing several portions of the full-length environmental gene sequences with sequences of the GenBank database using BLAST.Sequences from the clone libraries, along with closely related environmental clones, closest cultivated members and some representative sequences of the major taxa, were aligned and selected for phylogenetic tree construction using the Maximum Likelihood method using the MEGA software. [...] Metagenomic sequencing was done by the Josephine Bay Paul Center for Comparative Molecular Biology and Evolution at the Marine Biological Laboratory (Massachusetts, USA). The concentrations of the two genomic DNA samples (HN4_march12, HN4_may12) were determined by Quant-iT Picogreen dsDNA assay (Life Technologies, Carlsbad, CA). Samples were fragmented to ~ 170 bp using a Covaris S220 Focused-ultrasonicator (Covaris Inc. Woburn, MA) and metagenomic libraries were prepared according to the Nugen Ovation® Ultralow Library system protocol (NuGen Technologies, Inc. San Carlos, CA). Prior to sequencing, metagenomic libraries were visualized on an Agilent DNA 1000 Bioanalyzer chip (Agilent Technologies, Santa Clara, CA) and quantified using a KAPA SYBR® FAST Universal qPCR Kit (KAPA Biosystems, Boston, MA). Paired-end sequencing (2 × 113 bp) was performed on an Illumina HiSeq 1000 (Illumina, Inc. San Diego, CA). Base calls, sample demultiplexing, quality scores, and individual FASTQ files for each sample were generated on a CASAVA 1.7 + pipeline (Illumina Inc. San Diego, CA). [...] Forward and reverse reads from paired-end sequenced DNA libraries were assembled using the FLASH software with default parameters (minimum overlap, 10 nt; maximum allowed ratio between the number of mismatched base pairs and the overlap length, 0.25). Merged paired-end reads where assembled without further filtering with MEGAHIT v0.3.3-a software using 27 to 123 k-mers range in steps of 10, all other parameters set to their default values. Gene prediction on subsequently generated contigs was performed with Prodigal version 2.60, metagenomic mode. The sequence pool thus generated was screened with RefSeq nr (release 68, Nov 3, 2014) and COG database updated in 2014 using USEARCH v6.0.307 software and –ublast/-evalue < = 1e-04 parameters. Normalization of bacterial metagenomic hit counts was performed by comparing with a set of 40 well known single-copy gene families universally distributed among prokaryotic genomes, which are present in the COG database. Determination of cyc2 gene homologs differential expression was performed by direct sequence similarity search of 2 cyc2 homologs namely AKN78226.1 (Cyc2PV-1 from Mariprofundus ferrooxydans) and ADE10507 (Slit_0265 Sideroxydans lithotrophicus), against the appropriate full reads data sets. […]

Pipeline specifications

Software tools BLASTN, CodonCode Aligner, UCHIME, ARB, RAxML, QIIME, UCLUST, PyNAST, ChimeraSlayer, RDP Classifier, FastTree, UniFrac, vegan, Ggplot2, MEGA, BaseSpace, MEGAHIT, Prodigal, USEARCH
Databases COGs
Organisms cellular organisms
Chemicals Carbon, Carbon Dioxide, Iron