Computational protocol: Pyrosequencing Investigation into the Bacterial Community in Permafrost Soils along the China-Russia Crude Oil Pipeline (CRCOP)

[…] Data preprocessing was performed mainly upon software of mothur . The raw sequence was trimmed off the standard primers and barcodes, assembled the reads to contigs. Sequences less than 150 bp in length and greater than 3% low quality bases (quality score <27) were removed. The chimeric sequences were also excluded by the chimera.uchime command with default parameters. These valid sequences were finally trimmed to 300 bp and then aligned with needleman algorithm and clustered with the bacterial SILVA database (SILVA 108). The candidate sequences were assigned to the taxonomy with classify.seqs command (Bayesian approach). And the dist.seqs command generated the distance matrix between aligned DNA sequences. Gap comparisons and terminal gaps were handled with the method option of calc = onegap and countends = T. Then, these sequences were clustered to OTUs (operational taxonomic units) at 97% sequence identity by using mothur (furthest neighbor method) and chopseq (Majorbio). Rarefaction analysis was performed by mothur and plot-rarefaction (Majorbio). From these, the Shannon diversities and the Chao1 richness estimations were calculated by mothur. The weighted UniFrac distance was used to quantify differences in community composition. Heatmap figure and Venn diagrams were implemented by R packages pheatmap and VennDiagram , respectively. In addition, weighted principal component analysis (PCA) and Nonmetric Multidimensional Scaling (NMDS) diagrams were generated by using R package vegan to demonstrate the clustering of different samples. The sequences for this article have been deposited in NCBI SRA under the accession number SRA057910. […]

