Computational protocol: Genomic investigation of a suspected outbreak of Legionella pneumophila ST82 reveals undetected heterogeneity by the present gold-standard methods, Denmark, July to November 2014

Protocol publication

[…] Genomic DNA used for SBT was also used for whole genome sequencing (WGS) using the Illumina MiSeq platform to obtain 251-bp paired-end reads according to the instructions from the manufacturer, or the Illumina HiSeq platform with 100-bp paired-end reads. The isolates of the four ST82 cases were initially analysed together with three other epidemiologically unrelated community-acquired ST82 isolates from Denmark; two cases from Funen in 2011 and one case from Jutland in 2012. None of the 488 genomes available at were ST82 and eligible for inclusion. However, additional sequences were subsequently included in the analysis - one sequence from a ST82 isolate from 2015, Jutland in Denmark, and four ST82 sequences from the United Kingdom (UK) [], where one sequence originated from an isolate of a travel-associated case. This resulted in a total set of 12 whole genome sequences for the investigation ().Identification of single nt polymorphism (SNP) variants was performed using NASP 1.0 ( by aligning sequence reads from the 12 Legionella isolates against the chromosome of L. pneumophila subsp. pneumophila str. Lorraine (GenBank accession number: NC_FQ958210) using the Burrows-Wheeler Aligner (BWA) [] after removal of duplicated regions in the reference using NUCmer [,]. The Lorraine strain was chosen as reference as it was the closest closed reference available (as determined by k-mer analysis; the Lorraine strain is ST47 and shares four of the seven SBT loci with ST82 (flaA (allele number 5), asd (number 22), proA (number 6) and neuA (number 6).Variants were identified using the Genome Analysis Toolkit (GATK) Unified Genotyper, and all SNPs that did not meet a minimum coverage of 10 or that were present in < 90% of the base calls were excluded. High-density regions of SNPs including those derived from recombination events were removed using Genealogies Unbiased By recomBinations In Nt Sequences (Gubbins) v1.4.4 [] with default settings. Phylogenetic trees were constructed using the maximum-likelihood algorithm implemented in PhyML at Smart Model Selection using the Bayesian Information Criterion with 100 bootstrap replicates. The Illumina sequences generated from the 8 Danish L. pneumophila isolates described in this study were submitted to the European Nt Archive (ENA;, accession numbers listed in study ID PRJEB21315. Accessory genomes were analysed by using assembled Genomic reads by Spades v. 3.5.0 []. Prokka v. 1.2 [] was used for gene annotation and Roary v. 3.6.0 [] was applied to define gene ‘presence/absence’. Results were inspected manually. […]

Pipeline specifications