Computational protocol: Salmonella Enteritidis ST183: emerging and endemic biotypes affecting western European hedgehogs (Erinaceus europaeus) and people in Great Britain

Protocol publication

[…] WGS single nucleotide polymorphism (SNP) phylogenetic analysis was performed on the hedgehog Salmonella spp. isolates and from human ST183 isolates for which WGS data were available. Briefly, phylogenies were determined by mapping the short reads against the S. Enteritidis P125109 reference genome (NCBI accession AM933172) using BWA mem v0.7.12; SNPs were then called using GATK v2.6.5 in unified genotyper mode. A consensus genome was called for each isolate, with positions with <10 reads mapped or an Mapping Quality of <30 defined as Ns. SNPs at positions that were present (i.e. not N) in 80% of isolates and which passed quality criteria of >90% consensus, minimum depth 10x, GQ > = 30 and MQ > = 30 in at least one strain were extracted and used to derive a consensus genome for each isolate. Regions of the reference genome corresponding to pro-phage regions, as identified by PHAST, were masked. These regions were 920105-949171, 1013332-1021872, 1222614-1257489, 1453431-1492466, 2007699-2072028 of the AM933172.1 reference. This whole genome alignment was then used as input for Gubbins to remove potential recombinant regions and the Gubbins filtered polymorphic positions used to generate a maximum likelihood phylogeny with IQ-TREE v1.3.10. Phylogenies were visualised with phandago. Raw FASTQ data was uploaded to the NCBI SRA BioProject PRJNA248792, with sample specific accessions available in Supplementary Table .The relationship between spatial distance and genetic distance for ST183 was calculated in the following way (after Figure 7 of). A SNP pairwise distance matrix was produced for all available ST183 from humans. Then a pairwise distance matrix consisting of straight line geographic distance in kilometres was produced for all human isolates with an associated postcode using the postal area (first half of postcode) translated into an XY co-ordinate and the distance between them calculated according to Pythagoras’ theorem. Then, for a threshold of each SNP distance in the range 0-100, the number of pairs with a SNP distance less than or equal to the SNP threshold that were also within 30 km of each other in the geographic distance was expressed as a proportion of the total number of pairs with SNP distance less than or equal to the threshold. 100 random permutations of the same data were used to show signal over noise. In order to address the hypothesis that ST183 had a stronger geographical relationship than ST11, a control set of 71 ST11 isolates was analysed. It was not feasible to carry out the permutation analysis with the entire collection of ST11 in the PHE database, as this consisted of thousands of isolates. The control set was selected on the following criteria (1) similar pairwise SNP distance distribution to that of all ST11 (Supplementary Figure ) and (2) similar maximum pairwise SNP distance as within ST183.The demography of human infections with PT11 and all non-PT11 S. Enteritidis infections was summarised. The home address of each human ST183 isolate and 2060 ST11 isolates from Jan 1st to December 31st 2015 were classified as rural or urban based on the patient post code using ArcGIS v10.Antimicrobial resistance determinants were sought using ‘Genefinder’, a PHE program that uses Bowtie 2 to map reads to a set of reference sequences representing typically acquired antibiotic resistance genes and chromosomal regions involved in antibiotic resistance and SAMtools to generate an mpileup file. The mapping data were parsed, and reference sequences with 100% coverage, >85% base-call variation and >90% nucleotide identity were called as present in the genome, while allowing for detection of novel sub-types. β-lactamase variants were determined with 100% identity using the reference sequences downloaded from the Lahey ( or NCBI β-lactamase data resources ( Known acquired resistance genes and resistance-conferring mutations relevant to β-lactams (including carbapenems), fluoroquinolones, aminoglycosides, chloramphenicol, macrolides, sulphonamides, tetracyclines, trimethoprim, rifamycins and fosfomycin were included in the analysis. […]

Pipeline specifications

Software tools BWA, GATK, PHAST, Gubbins, IQ-TREE, Bowtie, SAMtools
Databases SRA
Application Phylogenetics
Organisms Salmonella enterica subsp. enterica serovar Enteritidis, Erinaceus europaeus, Homo sapiens
Diseases Infection, Salmonella Infections, Zoonoses