Computational protocol: Industrial Acetogenic Biocatalysts: A Comparative Metabolic and Genomic Analysis

[…] Standard molecular cloning techniques were performed according to established protocols (). Genomic DNA of clostridia was isolated using ‘Epicentre MasterPureTM Gram Positive DNA purification kit’ (Biozym Scientific GmbH, Hessisch Oldendorf, Germany). Plasmid DNA of E. coli strains were obtained by ‘ZyppyTM plasmid miniprep kit’ (Hiss Diagnostics GmbH, Freiburg, Germany). DNA fragments of clostridial DNA were amplified via PCR using ‘ReproFast polymerase’ (Genaxxon, Ulm, Germany).Genomic DNA of C. coskatii ATCC PTA-10522 and C. ragsdalei DSM 15248 was sequenced using an Illumina MiSeq system (Illumina, San Diego, CA, USA). Illumina shotgun libraries were generated from the extracted DNA according to the protocol of the manufacturer. Sequencing resulted in 2,179,216 300-bp paired end reads for C. coskatii and 2,179,216 300-bp for C. ragsdalei. Reads were trimmed using Trimmomatic 0.32 () to remove sequences with quality scores lower than 20 (Illumina 1.9 encoding) and remaining adaptor sequences.The de novo assembly performed with the SPAdes genome assembler software 3.5.0 () resulted in 112 contigs (>500 bp) for C. coskatii, in 79 contigs (>500 bp) for C. ragsdalei and an average coverage of 91.62-fold and 396.2-fold, respectively. Automatic gene prediction was performed by using the software tool Prodigal (). Genes coding for rRNA and tRNA were identified using RNAmmer () and tRNAscan (), respectively. The IMG-ER system () was used for automatic annotation, which was subsequently manually curated by using the Swiss-Prot, TrEMBL, and InterPro databases (). Genome sequences have been deposited at DDBJ/EMBL/GenBank under the accession numbers LROR00000000 (C. coskatii PTA-10522) and LROS00000000 (C. ragsdalei P11). The versions described in this paper are versions LROR01000000 and LROS01000000, respectively. [...] High quality genome sequences are available for C. ljungdahlii () and C. autoethanogenum (; ). A draft genome sequence of C. ragsdalei (328 contigs) is accessible using the “Integrated Microbial Genomes-Expert Review” (IMG/ER) system (). A draft genome sequence of C. coskatii was recently listed by , but unfortunately, the authors deposited only raw data (SRR1970390) and not an annotated genome sequence at the NCBI (National Center for Biotechnology Information) database. Therefore, all subsequent analyses were performed using the genome sequences listed in Table . Genome sequences of C. ljungdahlii DSM 13582, C. autoethanogenum DSM 10061, C. ragsdalei DSM 15248, and C. coskatii ATCC PTA-10522 (Table ) were analyzed using ‘IMG/ER system’ () provided by the ‘DOE Joint Genome Institute’ (Walnut Creek, CA, USA). Orthologous genes (orthologs) among genome sequences were identified using Proteinortho version 4.26 (default specification: blast = blastp v2.2.24, E-value = 1e-10, alg.-conn. = 0.1, coverage = 0.5, percent_identity = 50, adaptive_similarity = 0.95, inc_pairs = 1, inc_singles = 1, selfblast = 1, unambiguous = 0) (). The respective excel file is available in the supplement (Supplementary Table ). Detailed gene analysis and comparison was done using ‘CLC Workbench 7’ (CLC Bio, a QIAGEN Company, Boston, MA, USA). Gene sequences encoding alcohol dehydrogenases were derived from respective genome sequences and a multiple sequence alignment was calculated using MAFFT (). Phylogenetic tree was reconstructed with the program MrBayes v3.2.5 (). […]

Pipeline specifications

Software tools Trimmomatic, SPAdes, Prodigal, RNAmmer, tRNAscan-SE, Proteinortho, BLASTP, MAFFT, MrBayes
Applications Genome annotation, Phylogenetics
Organisms Clostridium ljungdahlii, Clostridium autoethanogenum, Clostridium acetobutylicum
Chemicals Acetaldehyde, Acetone, Ethanol, Butyrates, NADP, Nucleotides, Acetic Acid, 2-Propanol