Computational protocol: A Comprehensive Insight into Tetracycline Resistant Bacteria and Antibiotic Resistance Genes in Activated Sludge Using Next-Generation Sequencing

Similar protocols

Protocol publication

[…] The PCR products of tetG were cloned to investigate the diversity of the genes in the sludge treated with 0 and 20 mg/L for 6 days. The purified PCR products were cloned to pMD19-T Vector (TaKaRa, Shiga, Japan). A total of 52 clones were randomly selected for the library construction, sequencing and subsequent similarity analysis. Nucleotide sequences of tetG were aligned using CLUSTALW []. The clones sharing a consensus sequence were grouped into one genotype, and only one representative in each group was selected for construction of phylogenetic trees. The GenBank sequences having the highest identity to the sequences obtained in this study were retrieved for phylogenetic trees construction. The neighbor-joining trees were constructed using Molecular Evolutionary Genetics Analysis (MEGA version 5.05) [] and bootstrap analysis with 1000 replicates was conducted to evaluate the significance of the nodes. The 52 sequences of tetG cloning obtained in this study have been deposited in NCBI (Accession number: KJ603168~KJ603219). [...] The metagenomic DNA extracted from the sludge cultured with 0 and 20 mg/L tetracycline was individually subjected to high-throughput sequencing using Illumina Hiseq 2000 (Illumina, San Diego, CA, USA) according to the manufacturer’s instructions. The “Index 101 PE” (Paired End sequencing, 101-bp reads and 8-bp index sequence) sequencing strategy was used for the high-throughput sequencing, which generates nearly equal amount of clean reads for each sample. A base-calling pipeline (Sequencing Control Software, Illumina, San Diego, CA, USA) was applied to process the raw fluorescent images and the call sequences. The raw reads containing three or more “N” or contaminated by adapter (>15 bp overlap) were removed, and the filtered clean reads (about 1.6 Gb per each sample) were used for further metagenomic analyses. The sequencing data were deposited in the metagenomics RAST server (MG-RAST) [] under accession number 4494851.3 (sludge treated with 20 mg/L tetracycline) and 4494856.3 (sludge without tetracycline treatment). [...] After 454 pyrosequencing, all the reads were subjected to the Pyrosequencing Pipeline Initial Process [] of the Ribosomal Database Project (RDP): (1) To sort the reads to the designated sample based on their nucleotide barcode; (2) To trim off the adapters and barcodes using the default parameters; and (3) To remove sequences containing ambiguous “N” or shorter than 200 bp []. Sequences were de-noised using the “pre.cluster” command in the Mothur platform to remove the erroneous sequences due to pyrosequencing errors [,]. PCR chimeras were filtered out using Chimera Slayer []. The reads marked as chimeras were extracted and submitted to RDP. Those being assigned to any known genus with 90% confidence were integrated with the non-chimera reads [], to form the “effective sequences”. The effective sequences of each sample were resubmitted to the RDP Classifier [] to identify the archaeal and bacterial sequences, and the unexpected archaeal sequences were manually removed. In order to study the tetracycline effect on microbial communities, the samples of day 6 were individually selected for pyrosequencing, which generated a total of 42,556 reads for the four samples. As shown in , low quality reads were filtered using RDP and the effective reads were obtained after trimming the adapters, barcodes and primers. After denoising, filtering out chimeras and removing the archaeal sequences, the minimum number of bacterial sequences in the four samples was 7097. To fairly compare the four samples at the same sequencing depth, the number of the sequences from each sample was normalized to be 7097 for subsequent bioinformatics analyses. Taxonomic assignment of the sequences was separately performed using the RDP’s Classifier. A bootstrap cutoff of 80% was applied to assign the sequences to different taxonomy levels. Richness and diversity indices including OTUs, Chao 1 estimator and Shannon index, as well as rarefaction curves, were calculated using the relevant RDP modules, including Rarefaction and Shannon & Chao1 index.Illumina sequencing reads were aligned against a self-established database via off-line BLAST to identify ARGs and plasmids in the sludge samples. A protein database of ARGs were created by downloading all sequences in ARDB (7828 sequences) []. A read was identified as an ARG according to its best BLAST hit (blastx) if the similarity was above 90% and the alignments was at least 25 amino acids []. The nucleotide sequences of plasmids were downloaded from NCBI RefSeq database (2408 plasmid genome sequences). A read was annotated as plasmids if the best BLAST hits (blastn) had a nucleotide sequence similarity >95% over at least 90 bp alignment []. […]

Pipeline specifications