Computational protocol: Genomic Epidemiology of a Protracted Hospital Outbreak Caused by a Toxin A-Negative Clostridium difficile Sublineage PCR Ribotype 017 Strain in London, England

Protocol publication

[…] The sequence data were processed according to a standard pipeline as previously described (). Briefly, FASTQ-formatted sequencing reads were quality controlled with a minimum-quality Phred score of 30 (as a rolling average over 4 bases) using Trimmomatic (). The resulting reads were mapped, using BWA-MEM software (), against the M68 C. difficile reference strain (4,308,325 bp). The majority of posttrimmed reads were mapped to the reference (median, 98.1%; range 94.3% to 99.8%) with a mean depth of 64.6 bp per sample (range, 38.3 bp to 114.2 bp) and median 61,000 loci with no base call made (1.4% of the entire genome; range, 0.4% to 3%). SNP loci with a minimum quality score of 30 were identified using SAMtools (). SNP mutations in the samples that had read depths of >60 and had >70% of reads identified with the same allele (99.8% of SNPs were supported by >90% of contributing reads) were identified. In total, 748 SNPs were identified at 162 biallelic SNP loci: 94 (58.0%) loci coded for nonsynonymous genic changes, 21 (13.0%) loci encoded for synonymous genic changes, and 47 (29.0%) loci were in nongenic change regions. In addition, Velvet () and VelvetOptimiser () were used to de novo assemble the trimmed reads into contigs. Thirty-five high-quality assemblies were produced. Optimal k-mers fell between 63 and 81 bp. The mean N50 was >1 Mbp (range, 89 kbp to 4.2 Mbp). The mean longest contig was 1,122,300 bp (including 3 samples with >4 Mbp [96%] of the genome in a single contig). The two remaining assemblies produced contigs that equated to >97% of the genome but were highly fragmented for all postanalyses. Pipeline analysis, postanalyses, and genetic and phylogenetic analysis were carried out using Perl, R, ABACAS, Prokka, and RAxML software (). […]

Pipeline specifications

Software tools Trimmomatic, BWA, SAMtools, VelvetOptimiser, ABACAS, Prokka, RAxML
Applications Phylogenetics, De novo sequencing analysis
Organisms Clostridioides difficile, Homo sapiens
Diseases Infection