Computational protocol: Comparison of Multilocus Variable-Number Tandem-Repeat Analysis and Whole-Genome Sequencing for Investigation of Clostridium difficile Transmission

[…] DNA for sequencing was extracted using a commercial kit (QIAamp [Qiagen, Hilden, Germany] or QuickGene [Fujifilm, Tokyo, Japan]). Samples were prepared for sequencing using standard Illumina (San Diego, CA) and adapted protocols. Pools of 96 samples were sequenced at the Wellcome Trust Centre for Human Genetics, Oxford, United Kingdom, on the Illumina HiSeq 2000 platform, generating 100-bp reads. Properly paired sequence reads were mapped using Stampy version 1.0.17 (without Burrows-Wheeler Aligner premapping, using an expected substitution rate of 0.01) () to the C. difficile 630 reference genome (GenBank accession no. AM180355.1) ().Single nucleotide variants (SNVs) were identified across all mapped nonrepetitive sites using SAMtools (version 0.1.18) () mpileup with the extended base-alignment quality flag, after parameter tuning was performed based on bacterial sequences (options -E -M0 –Q25 -q30 -m2 -D -S; other values were defaults). Repetitive regions were identified using BLAST () searches of the reference genome using fragments of the same genome. GATK version 1.4.21 () was used to create variant call format (VCF) files of the annotated variant sites. We used only SNVs that were supported by ≥5 reads, including one in each direction. A consensus of ≥90% of high-quality bases (Phred scaled quality, ≥25) was also required to support an SNV, and calls had to be homozygous under a diploid model. The calls required the proportion of bases of quality of ≥25 in reads spanning the site of interest to be ≥0.35. A median (interquartile range [IQR]) of 84.0% (83.8% to 84.8%) of the C. difficile 630 reference genome was called across all sequenced isolates.Adjustment for any clustering of SNVs that was suggestive of recombination was undertaken using the method described in Golubchik et al. (). The parameter values for the recombination adjustment were obtained from Didelot et al. (). In cases in which the parameters were not available for a specific lineage, they were obtained from the genetically closest lineage. […]

Pipeline specifications

Software tools Stampy, BWA, SAMtools, GATK
Application WGS analysis
Organisms Clostridioides difficile, Homo sapiens
Diseases Clostridium Infections
Chemicals Nucleotides