Computational protocol: Candidatus Nitrosocaldus cavascurensis, an Ammonia Oxidizing, Extremely Thermophilic Archaeon with a Highly Mobile Genome

[…] DNA was prepared from 480 mL of culture using standard procedures and sequenced using a PacBio Sequel sequencer at the VBCF (Vienna BioCenter Core Facilities GmbH) with a SMRT Cell 1Mv2 and Sequel Sequencing Kit 2.1. Insert size was 10 kb. Around 500000 reads were obtained with an average size of ∼5 kb (N50: 6866 nt, maximal read length: 84504 nt).The obtained PacBio reads were assembled using the CANU program (version 1.4, parameters “genomeSize = 20 m, corMhapSensitivity = normal, corOutCoverage = 1000, errorRate = 0.013”) (), and then “polished” with the arrow program from the SMRT analysis software (Pacific Biosciences, United States). The Ca. N. cavascurensis genome consisted of two contiguous, overlapping contigs of 1580543 kb and 15533 kb, respectively. The resulting assembly was compared to a previous version of the genome bin we had obtained by using IDBA-UD and Newbler for assembly, and differential coverage binning on ∼150 nt reads from five runs of IonTorrent sequencing (use of GC% and reads coverage from IonTorrent PGM reads) (; ). The new assembly confirmed the previous version as all nine contigs from the previous aligned with MUMMER to the largest of the two contigs, except for a ca. 21 kb region we had not manually selected using differential coverage binning because of a variable read coverage between runs. Interestingly, this region belongs to an integrated conjugative element (ICE1, see results) that might perhaps be excised occasionally. A repeated region between both extremities of the longest contig, and the 2nd small contig obtained was found, and analyzed using the nucmer program from the MUMMER package (see Supplementary Figure ) (). This region was merged using nucmer results, and the longest contig obtained was “circularized” using the information of the nucmer program, as sequence information from the 2nd contig was nearly identical (>99.5%) to that of the extremities of the long contig. It coincided with a repeat-rich adhesin (Supplementary Figure ). The origin of replication was predicted using the Ori-Finder 2 webserver (). By analogy with Nitrosopumilus maritimus, it was placed after the last annotated genomic object, before the ORB repeats and the cdc gene. The annotated genome sequence has been deposited to the European Nucleotide Archive (ENA) with the study accession number: PRJEB24312. […]

