Similar protocols

Protocol publication

[…] Two metagenomic libraries were constructed with genomic DNA respectively extracted from the SE and LE sludge samples. Genomic DNA was extracted from 500 mg dry weight sludge sample with FastDNA® SPIN Kit for Soil (MP Biomedicals, LLC, Illkirch, France). Sequencing of the metagenomic DNA was carried out on the Illumina Hiseq 2000 platform at BGI (Shenzhen, China) by applying the 101 bp paired-end strategy with combined insert lengths of 180 and 800 bp for SE metagenome and sole 180 bp insert for LE metagenome (Additional file : Table S2). The resulted PE reads were trimmed for sequencing adaptors before filtering out reads with average phred quality score lower than 20 and ambiguous nucleotide using PRINSEQ []. The shotgun metagenomic reads have been deposited into the MG-RAST server for data sharing (see Table S1 for the accession number). SE and LE metagenomes and LE metatranscriptome have been used in our previous studies with focus other than Anaerolineae populations [, ]. [...] De novo assembly by three popular de novo assemblers, namely MetaVelvet (1.2.01) [], IDBA_UD (1.1.1) [], and CLCbio Genomic Workbench 6.0.2 (CLCbio, Denmark), were compared in terms of reads utilization efficiency and length of scaffolds (Additional file : Table S9). The most comprehensive IDBA_UD were picked to assemble the SE and LE metagenomes together using a series of kmer 20,40,60,80, and 100. Two metagenomes were assembled together to facilitate generation of long scaffolds. Only scaffolds longer than 1 kb were kept for subsequent genomic binning analysis.Based on the assumption that scaffolds belonging to the same genome (strain) should share similar coverage across different metagenomes, scaffolds of targeted Anaerolineae genome bins were recruited from the two-dimensional coverage plot using R scripts []. Divergent coverage of Chloroflexi populations were provided by metagenomic libraries of thermophilic cellulolytic sludge sampled from the same reactor but at two different times (SE at 120 days and LE at 545 days). The coverage sets of scaffolds were obtained by independently mapping PE reads in the SE and LE metagenomes against scaffolds assembled, using Bowtie 1.0.1 [] allowing two mismatches over the entire read length (bowtie option: −v 2 −m 200) []. Coverage of a scaffold was calculated as the total base pairs of mapped read divided by its length. After that, the scaffolds were binned based on the clustering of coverage and phylum assignment. To minimize the potential contamination, another genomic signature, tetra-nucleotide frequency (TNF), was used to refine the bins at euclidean distance cutoff of 0.1 []. Finally, PE-tracking tools from the mm genome package [] was used to reinforce the scaffolding by retrieving genes initially excluded, for example, genes showing deviate coverage caused by multiple copies.At the same time, community composition was assessed by identifying 16S rRNA sequences in metagenomes. The unassembled illumina reads were searched against Silva SSU 115 database [] with BLASTN [] using evalue cutoff of 1E−20. The tabular BLAST results were parsed at phylum level with MEGAN4 [] using the lowest common ancestor algorithm. [...] Complete 16S rRNA gene of the genome bins TCF-2, 5, and 12 were determined by IMG 4.0 genome annotation pipeline [] and double confirmed by EMIRGE []. EMIRGE was used as a complementary approach to reconstruct 16S rRNA genes from the shotgun libraries with 80 iterations. Uchime [] was used to filter the possible chimera formed in EMIRGE before comparing the reconstructed 16S rRNA gene to that of the curated genome bins. The incomplete prediction of 16S rRNA gene in TCF-13 (258 bp) was manually extended based on its nearly identical BLAST match (similarity higher than 99 % over 258 bp) to a 16S rRNA sequence in Silva SSU database (version 11.5). [...] In order to determine the phylogenetic position of draft genomes obtained here, neighbor-joining tree of Anaerolineae was built using MEGA5 [] with maximum-likelihood method and bootstrap value of 1000. A phylogenetic tree was constructed using (1) 16S rRNA sequences of the draft genomes, (2) 16S rRNA gene of A.thermophila UNI-1, (3) 16S rRNA gene of ten isolated strains and high-quality 16S clones collected from Silva SSU database.To determine the phylogenetic affiliation of TCF-8 whose 16S rRNA gene is too short for reliable alignment, genome tree was constructed from a concatenated alignment of 35 protein-coding ESCGs shared in single-copy manner among the five curated genomes and twenty-two finished genomes of Chloroflexi in IMG 4.0. A maximum-likelihood tree was created using phyml 3.1 [] using default setting for amino acids with 100 bootstraps based on MUSLE [] alignments. […]

Pipeline specifications