Computational protocol: The Draft Genome of the Non-Host-Associated Methanobrevibacter arboriphilus Strain DH1 Encodes a Large Repertoire of Adhesin-Like Proteins

[…] The genome of Methanobrevibacter arboriphilus strain DH1 was sequenced with a combined approach using the 454 GS-FLX Titanium XL system (titanium GS70 chemistry, Roche Life Science, Mannheim, Germany) and the GenomeAnalyser IIx (Illumina, San Diego, CA). Shotgun libraries were prepared according to the manufacturer's protocols, resulting in 99,511 454 shotgun sequencing reads and 11,827,196 112 bp paired-end Illumina sequencing reads. The Illumina reads were quality trimmed using Trimmomatic version 0.32 []. All of the 454 shotgun reads and 2,637,606 of the Illumina reads were used for the initial hybrid de novo assembly with MIRA 3.4 [] and Newbler 2.8 (Roche Life Science, Mannheim, Germany). The final assembly contained 40 contigs with an average coverage of 92.85. The assembly was validated and the read coverage determined with QualiMap version 2.1 []. The quality and the completeness of the draft genome has been validated with CheckM []. [...] The genome data were uploaded to the Integrated Microbial Genomes Expert Review (IMG/ER) platform ( Coding sequences were predicted and annotated using the automated pipeline of IMG/ER []. Briefly, protein-encoding genes were identified with GeneMark [], and candidate homologue genes of the genomes were computed using BLASTp []. Automated annotations of coding sequences were verified and curated by comparing various annotations based on functional resources, such as COG clusters, Pfam, TIGRfam, and gene ontology. The annotated genome sequence of M. arboriphilus strain DH1 (Gs0106968 or Gp0076455) is available in the Genomes Online database ( [...] Aligned sequences were selected from RIM-DB [] and exported in phylip format to construct phylogenetic trees using all available base positions. Maximum likelihood phylogenetic trees based on aligned archaeal 16S rRNA gene sequences were generated using RAxML version 7.0.3 []. The parameters “-m GTRGAMMA -# 500 -f a -x 2 -p 2” were used. [...] Orthologous genes (orthologs) among genome sequences were identified using Proteinortho version 4.26 (default specification: BLAST = BLASTp v2.2.24, E value = 1e−10, alg.-conn. = 0.1, coverage = 0.5, percent_identity = 50, adaptive_similarity = 0.95, inc_pairs = 1, inc_singles = 1, selfblast = 1, and unambiguous = 0) []. COG categories of the genes were extracted from IMG database entries of M. arboriphilus DH1. […]

Pipeline specifications

Software tools GeneMark, BLASTP, PHYLIP, RAxML, Proteinortho
Databases Pfam GOLD RIM-DB
Applications Genome annotation, Phylogenetics
Chemicals Methane, Oxygen