Computational protocol: Metagenomic analysis of microbial community of an Amazonian geothermal spring in Peru

Similar protocols

Protocol publication

[…] Microbial samples from extreme habitats (extremophiles) represent a huge reservoir of genetic diversity and a potent source for industrially important enzymes but the culturability of microorganisms from extreme habitats is generally very poor. For long time, full length sequences generated by Sanger sequencing of 16S rRNA clone libraries were considered to be the gold standard for phylogenetic analysis but soon it was realized that this procedure is not only expensive but also have many limitations. To overcome the limitations and to gain high taxonomic resolution of bacterial population in extreme habitats, culture-independent 16S rRNAs amplicon based metagenome sequencing became a common practice . Both 454 (Roche) and Illumina platforms are now largely used to study metagenome and microbial diversity , but the most recent and advanced Illumina MiSeq/HiSeq sequencing platform provides a very distinguish and high quality view of microbial composition than other sequencing technologies .Hot springs are unique sites for extremophilic microorganisms and are of great interest for many years because enzymes obtained from them have been proved to be extremely valuable as biocatalysts for industrial and biotechnological purposes. Moreover, many unknown microbial species and genes have been revealed in culture-independent microbial diversity assessment of hot springs . The Peruvian Amazon rainforest is one of the most biologically diverse areas on earth and a rich source of several novel microbial species. Peruvian Amazon is endowed with few hot springs but till date none of them was explored in detail to generate high resolution microbial profile. So, the main goal of this study is to generate a high resolution microbial phylogenetic profile of an Amazonian hot spring.In this study water samples were collected from four random sampling points of AC and mixed in an equal ratio as previously stated by Chan et al. during metagenomic study of a Malaysian hot spring. The temperature and pH were measured on-site. Metagenomic DNA was extracted using PowerWater® DNA Isolation Kits (Mo Bio Laboratories) following manufacturer's protocol. Twenty five nanograms of Nanodrop quantified DNA was used for amplifying the V3–V4 region of 16S rRNA with specific primers which also have a ‘tag’ sequence that are complementary to Illumina sequence adapter and index primers from the Nextera XT Index kit V2. This round of PCR generates single amplicons of ~ 530 bp. In the next round of PCR (indexing PCR) Illumina sequencing adapters and dual indexing barcodes are added using limited cycle PCR to give a final product of ~ 610 bp. The libraries were cleaned using HighPrep PCR (Magbio, Cat # AC-60050) magnetic beads, Qubit quantified and validated for quality by running an aliquot on High Sensitivity Bioanalyzer Chip (Agilent). Finally, the cleaned libraries were sequenced in Illumina HiSeq platform at Genotypic Technology Private Limited, Bangalore, India. The Illumina paired end raw reads was quality checked using FastQC tool . QIIME pipelines was used for 16S RNA detection, clustering and OTU picking followed by Biom file generation and statistical analysis. SRA files were deposited to NCBI database under Accession Number SRX1809286. In the present study functional analysis of 16S amplicons was performed using the default settings of PICRUSt version 0.9.1 .After quality filtration and adapter trimming of raw reads, clean sequences were clustered into 5434 operational taxonomic units (OTUs) using a 97% similarity cut off. Rarefaction curve indicated that a reasonable number of individuals were sampled (A). Good's coverage estimator revealed that > 99% of the species were estimated, while high values of Chao1 richness estimator (9762) and Shannon diversity index (4.16) indicated that microbial communities in AC are highly rich and diverse. All the resulting fragments were then classified into 58 phyla, 165 classes, 300 orders, 520 families and 954 genera. The top 5 represented phyla were Proteobacteria (88.06%), Thermi (6.43%), Firmicutes (3.41%), Aquificae (1.10%) and Chloroflexi (0.41%) (B); however, an unidentified bacterial phylum (0.10%) was also found among top 10 bacterial phyla. An OTU based phylogenetic tree displayed the genetic diversity among AC microbial community (C). Gammaproteobacteria (86.1%) was found to be the highest represented class in AC microbial community followed by Deinococci (6.43%), Bacilli (3.22%), Betaproteobacteria (1.28%) and Aquificae (1.10%). A Krona chart was constructed to illustrate the distribution pattern of phyla Proteobacteria in AC hot spring (A). Further affiliation revealed that in AC microbial community the most abundant genus was Acinetobacter (71.09%) of Moraxellaceae family in which most of the species were unidentified (48.29%); remaining major genera included Pseudomonas (8.59%) of Pseudomonadaceae family, Thermus (5.99%) of Thermaceae family, Enhydrobacter (3.38%) of Moraxellaceae family and Brevibacillus (1.52%) of Paenibacillaceae family. The dominance of Proteobacteria, Firmicutes and Choloroflexi phyla in hot spring was reported earlier in few studies . However, in contrast with our data, 16S rRNA based microbial diversity analysis from Little Hot Creek hot springs (temperature 78.7–82.5 °C and pH 6.75–6.97), California, similar to AC hot spring showed the dominance of the phyla Thermodesulfobacteria, Deinococcus–Thermus, Thermotogae and Dictyoglomi . Aquificae and Thermotogae are two best known hyperermophilic bacterial phyla that show high heat tolerance. In this study an unknown genus (1.09%) of Aquificae dominates the sample, the other two low abundant genera detected in this study were Hydrogenobacter (0.01%) and Hydrogenivirga (0.00%); while in Thermotogae the genus Fervidobacterium (0.09%) was found to be the most represented one.Undoubtedly, microbial diversity in hot springs is greatly affected by the pH of the water. Water samples from an acidic hot spring (pH 3.5–4) and a circumneutral hot spring (pH 7.2–7.4) were analyzed previously and the results revealed that Thermotogae and Gammaproteobacteria dominated the acidic hot spring, while Thermodesulfobacteria, Gammaproteobacteria and Betaproteobacteria monopolized the circumneutral hot spring. In alkaline hot springs, Thermus , Hydrogenobacter , Caldicellulosiruptor, Dictyoglomus and Fervidobacterium were reported to be the most represented genera. AC hotspring is slightly acidic to circumneutral in nature (pH 5.0–7.0) and in agreement with previous reports monopolization of Gammaproteobacteria was also found in this study. Regarding archaeal community structure Euryarchaeota and Crenarchaeota were the two major represented Archaeal phyla detected in AC but with low read count. In a recent study Thaumarchaeota and Crenarchaeota were found in both acidic and alkaline thermal spring environments, while Euryarchaeota was only found in the acidic environment . Further affiliation of archaeal community in this study revealed that different unidentified families of the order pGrfC26 under Crenarchaeota and few methanogenic members of the family Methanosaetaceae, Methanobacteriaceae and Methanosarcinaceae under Euryarchaeota were also present in AC hot spring. Among which the complete genome sequence of a strain of Methanobacterium thermoautotrophicum, one of the best methane producing thermophilic archaeon are available in the database .In this study, PICRUSt, a modern tool designed to infer metagenomic information from 16S amplicon sequencing data, was used . PICRUSt uses the OTU table of assigned taxa and their relative distribution to generate the relative abundance of functional categories based on sequenced genomes. Predicted abundance of gene categories (COGs) (B) and metabolic pathways (KEGG) (C) revealed that in COG analysis, despite of “General” (12.2%) and “Unknown”(8.9%) function categories, the highest represented category at second tier was “Amino acid transport and metabolism” (8.5%), followed by “Transcription” (7.1%) and “Energy production and conversion” (6.5%) (B). Among predicted KEGG pathways “Metabolism” (50.6%) was the most abundant category at first tier followed by “Genetic information processing” (16.2%), “Unclassified” (15.8%) and “Environmental information processing” (12.6%) (C). In agreement with our KEGG functional analysis data, “Metabolism” category followed by “Genetic information processing” was also reported previously in a Malayasian hot spring microbial community . […]

Pipeline specifications

Software tools FastQC, QIIME, PICRUSt
Applications Phylogenetics, Metagenomic sequencing analysis, 16S rRNA-seq analysis
Chemicals Amino Acids