Computational protocol: Identification and pathogenomic analysis of an Escherichia coli strain producing a novel Shiga toxin 2 subtype

Similar protocols

Protocol publication

[…] stx subtypes of STEC isolates were determined by the PCR-based subtyping method. For strains that failed to be detected by the stx2 subtype-specific primers, the completed stx2 gene was amplified as described previously,, then cloned into vector pMD18-T and transformed into E. coli JM109 (Takara, Dalian, China). About 10 transformants were selected for sequencing to discern multiple stx2 subtypes in a PCR product. The 93 representative reference nucleotide sequences of the full stx2 operon of stx2 subtypes and variants (stx2a-stx2g) were downloaded from GenBank as previously described. The amino acid sequences for the combined A and B holotoxin were translated from the open reading frames. The full nucleotide and amino acid sequences, including A and B subunits, the intergenic regions, were aligned and compared by using Clustal Omega to evaluate the differences between stx2 sequences. Phylogenetic trees based on the holotoxin amino acid sequences were reconstructed with three algorithms, neighbor-joining, maximum likelihood and maximum parsimony, using MEGA 7 software (, and the stability of the groupings was estimated by bootstrap analysis (1000 replications). Genetic distances were calculated by the maximum composite likelihood method. [...] Genomic DNA was isolated from an overnight culture using the Wizard Genomic DNA purification kit (Promega, USA) according to the manufacturer’s instructions. The complete genome was sequenced by single molecule, real-time (SMRT) technology using the Pacific Biosciences (PacBio) sequencing platform. The data were assembled to generate one circular genome without gaps by using SMRT Analysis 2.3.0. The protein-coding sequences (CDSs), tRNAs and rRNAs were predicted using GeneMarkS. The prophages were predicated by the PHAge Search Tool (PHAST). The virulence factors were predicted through the BLAST tool of NCBI and by using the virulence factor database (VFDB; [...] The sequence of the Stx converting phage was extracted from the complete genome by using the PHASTER (, the genome of Stx converting phage was reannotated using the RAST server (, and then manually verified and corrected. Functional annotation of selected CDSs was performed based on the results of homology searches against the public nonredundant protein database ( by using BLASTP. The gene adjacent to the integrase was designated as the phage insertion site. The full Stx2 phage sequence of the STEC299 was compared in detail to representative Stx2 converting phages and visualized using perl script. The Stx2 phage sequences of the reference strains used in the current study were kindly provided by Dr. David A. Rasko, University of Maryland School of Medicine. [...] To generate a robust, high-resolution phylogenomic tree depicting position of the novel Stx2 converting strain STEC299, the genome were compared with 32 E. coli/Shigella spp. completed genomes comprised of representatives of all the major pathotypes (Table ) by using two strategies: ribosomal protein gene sequence analysis (rMLST) and whole-genome multilocus typing (wgMLST). The ribosomal protein subunites (rps) gene sequences were extracted from the annotated whole-genome sequence of the 33 strains. Three independent runs were then carried out with ClonalFrame (version 1.2) on the extracted rps gene sequences and the outputs of the analyses were converged and merged to generate a 95% consensus tree. For wgMLST analysis, the completed whole-genome sequence of strain EDL933was used as reference to perform an ad hoc wgMLST analysis using Genome Profiler version 2.0. The relationship of the strains was further analyzed with Splits Tree 4. The whole-genome phylogeny was inferred from the concatenated sequences of the loci shared by the 33 whole-genome sequences, which was found in the wgMLST analysis. All the regions with elevated densities of base substitutions were eliminated and phylogenetic relationship were generated by Gubbins. […]

Pipeline specifications

Software tools Clustal Omega, SMRT-Analysis, GeneMarkS, PHAST, PHASTER, RAST, BLASTP, rMLST, ClonalFrame, Gubbins
Databases NMPDR
Applications Genome annotation, Phylogenetics, WGS analysis
Organisms Escherichia coli
Chemicals Amino Acids, Mitomycin